Download presentation
Presentation is loading. Please wait.
Published byScot Douglas Modified over 9 years ago
1
caBIG Workflow University of Chicago, USA University of Manchester, UK
2
Agenda caBIG Workflows: The “BIG” picture caBIG Workflow Infrastructure Semantic Service discovery Composing the Workflow Invoke stateful and secure services Workflow execution service Discovering and Executing caBIG workflows using caGrid portal Examples of caBIG workflows Future directions
3
The caGrid ecosystem and the role of workflow caGrid data instruments computation resource Virtualization Security Connectivity Discovery Composition Orchestration Reuse Community Scientific workflow lifecycle reuse generate Workflow as consumer Easily reuse services for complex experiments. Workflow as contributor Workflow as “best practice” wrapped as services.
4
The caBIG Workflow System caGrid Discovery composition Execution Reuse Community reuse generate Service discovery based on cancer research metadata. Data-flow modeling flavor caGrid activity State management (WSRF) Security (GSI) Implicit iteration: handle parallel execution WSRF and GSI enforcement A “Facebook” for caGrid workflows Workflow Execution. Service Workflows in caGrid Portal
5
Lymphoma Prediction Workflow Scientific value Use gene-expression patterns associated with two lymphoma types to predict the type of an unknown sample. Connect caGrid data service (caArray) with analytical services (PreProcess, SVM and KNN from GenePattern). Major steps Querying training data from experiments stored in caArray. Preprocessing, i.e., normalizing the microarray data. Predicting lymphoma type using SVM & KNN services. Extension Generalized the workflow into a cancer type prediction routine that can be used on other caArray data sets. *Fig. from MA Shipp. Nature Medicine, 2002 *
6
MicroArray from tumor tissue Microarray preProcessing Lymphoma prediction Lymphoma Prediction Workflow
7
Lymphoma type prediction Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT)
8
caGrid Workflow Infrastructure Semantic based service query Workflow composition
9
2 default caGrid configurations in Taverna: NCI Production caGrid v1.3 Training caGrid Configuration – a set of caGrid services belonging to the same grid Other “caGrids” can be defined through preferences Configuring Taverna
10
Semantic Service Discovery Semantic search – searches Index Service for registered caGrid services matching various search criteria: Service name, inputs, outputs, research center, class names, concept codes, etc.
11
Adding caGrid services directly If user knows WSDL url of a caGrid service – the service can be added directly
12
caBIG services palette As a result of semantic search or direct adding caBIG services appear in Taverna’s Service Panel Ready to be drag and dropped into caGrid workflows
13
Stateful caGrid services Taverna provides support for stateful caGrid services that implement the WSRF spec. Taverna can detect if a service is WSRF-compliant and adds special input port ‘EndpointReference’ to it EPR can be passed around the workflow as normal parameter
14
Secure caGrid services Taverna can invoke secure caGrid services that require user to log in to caGrid Taverna interacts with caGrid’s GAARDS infrastructure to obtain user’s proxy: Authenticate the user with user’s affiliated Authentication Service Obtain user’s proxy from Dorian Service Default proxy lifetime: 12 hours
15
Using secure caGrid services Involves: 1.Configuring a secure caGrid service from Taverna 2.Logging onto selected caGrid to obtain a proxy certificate 3.Saving and managing caGrid proxies and username and passwords
16
Configuring secure services (1/2) Authentication Service and Dorian Service urls required in order to obtain user’s proxy Can be configured globally for all services from the same caGrid (in preferences) Can be configured individually for a particular caGrid service (overrides configuration from preferences)
17
Configuring secure services (2/2) View secure’s service details Configure service’s security properties
18
Logging onto caGrid User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time
19
Credential management (1/2) Taverna obtains proxy for user from Dorian Service using user’s caGrid username and password Proxies are saved and managed by Credential Manager caGrid username and password can also be remembered
20
Workflow execution service Taverna Workflow Service wraps the Taverna execution engine into a WS-Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows. startWorkflow createResource getStatus getOutput Workflow Service Stateful Resources (Resource Properties) Stateful Resources (Resource Properties) EPR Taverna Engine Data Services Data Services Analytical Services Analytical Services caGrid & Other Services Client API Taverna Workbench Workflow Portlet
21
Workflow execution service Taverna Workflow Service Provides stateful resources that execute the workflows. Supports caGrid security architecture (GSI Security). Allows programmatic submission of workflows.
22
Access Taverna workflow via caGrid portal Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid: URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflowhttp://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL Users can select a workflow they are interested in running. View : 1
23
Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflowhttp://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox. For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection. Hit submit after the entering the data. View : 2
24
Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflowhttp://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow The portlet stores the user submitted workflows in the current session of the portal. Users can View all the Active and Completed Workflows in the session. Clicking the Output Button shows the output of the workflow. The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table. Views : 3, 4, & 5 Ack. Manav Kher, Joshua Phillips (SemanticBits)
25
Workflow execution service plug-in Submit the workflow into an execution servce. Retrieve execution result asynchronously.
26
Examples of caBIG workflows: caDSR Scientific value To find all the UML packages related to a given context (‘caCore’). Not a real scientific experiment. Simple. Important in caGrid. Steps Querying Project object. Do data transformation. Querying Packages object and get the result. Workflow input caGrid services “Shim” services Workflow output
27
Protein sequence information query Scientific value To query protein sequence information out of 3 caGrid data services: caBIO, CPAS and GridPIR. To analyze a protein sequence from different data sources. Steps Querying CPAS and get the id, name, value of the sequence. Querying caBIO and GridPIR using the id or name obtained from CPAS.
28
Microarray clustering* Scientific value A common routine to group genes or experiments into clusters with similar profiles. To identify functional groups of genes. Steps Querying and retrieving the microarray data of interest from a caArrayScrub data service at Columbia University Preprocessing, or normalize the microarray data using the GenePattern analytical service at the Broad Institute at MIT Running hierarchical clustering using the geWorkbench analytical service at Columbia University Workflow in/output caGrid services “Shim” servicesothers *Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster. Orchestrating caGrid Services in Taverna. ICWS 08.
29
Execution trace Execution result as xml 1936 gene expressions
30
caGrid workflows in myExperiment caGrid Workflows covered Data service workflow caDSR query Protein sequence query Data + analytical service Microarray clustering Lymphoma type classification caGrid workflows are uploaded to myExperiment and accessible from: http://www.myexperiment.org/workflows/search?query=cabig
31
Future Directions More guidance in workflow modeling Leverage caDSR, EVS and the workflows at myExperiment More friendly user interface A CQL builder for caGrid data services More shim services for data transformation More features Integration with caGrid transfer to access data Browsing and executing workflows from caGrid portal Enhanced security support More workflows of real scientific value
32
More information caGrid workflow http://cagrid.org/display/workflow/Home Our team Carole Goble Univ. Manchester, UK Univ. Chicago Wei Tan Dinanath Sulakhe Stian Soiland-Reyes Ravi Madduri Alexandra Nenadic
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.