Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA 3 School of Computer Science, University of Manchester, Manchester, UK OVERVIEW For the empowerment of users from biological or biomedical domains in creating and executing their workflows efficiently, the caGrid Workflow team, with the ICR working group, has selected the Taverna workbench and successfully created a tool suite to orchestrate caGrid Data and Analytical services for ICR workflows. This tool suite aims at providing an easy-to-use workflow authoring and submission tool that will be capable of integrating caGrid services as well as third-party services in scientific workflows. We also helped caGrid community to build several workflows that have real scientific value, and we commit ourselves to support caBIG users across workspaces in creating and executing their domain based workflows. Web Resources: Taverna: caGrid Plug-in download: caBIG: CaGrid Workflow Quick Start Guide: End-to-End Solution for caGrid Workflow Search caGrid Index Service for registered caGrid services matching various search criteria: Service name, inputs, outputs, research center, class names, concept codes, etc. Application: Lymphoma Prediction Workflow *,[1] Scientific value Use gene-expression patterns associated with Diffuse large B- cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) to predict the lymphoma type of an unknown sample. Use GenePattern services SVM and KNN to build the tumor classification model and predict the tumor types of unknown examples. Major steps Extract Microarray. Querying training data and unknown sample from experiments stored in caArray. Preprocess Microarray. Preprocessing, or normalize the microarray data for later processing. Predict Lymphoma type. Predicting lymphoma type using SVM & KNN services. Extension Generalized the lymphoma prediction workflow into a cancer type prediction workflow. Applied it on Experiment 236 in caArray database.[2] [1] [1] MA Shipp, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002(8) [2] S. Ramaswamy, et al. Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, vol. 98, p , *Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI); Jared Nedzel (MIT) Log onto a given Grid, configure service’s security properties with caGrid credential. Lymphoma prediction workflow 1.Extract Microarray 2.Preprocess Microarray 3.Predict Lymphoma Type Semantic search WSRF Support Invoke stateful Grid services caGrid Security Support Available caGrid Workflows caDSR data query Protein sequence query Microarray clustering Lymphoma prediction Cancer classification caGrid workflows at myExperiment workflows/search?query=cabig workflows/search?query=cabig “Facebook” for caGrid workflows Result of the lymphoma prediction workflow Result of the cancer type prediction over caArray Experiment 236