Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid
Lots of Scientific Resources NAR 2009 – over 1170 databases
Interoperability, Integration and Collaboration Access to distributed and local resources Iteration over data sets Interactive Automation of data flow Agile software development Experimental protocols Taverna Workflows
Create and run workflows Create and manage services as components API Consumer Share, discover and reuse workflows Manage the metadata needed and generated RDF, OWL Discover and reuse services Feta Open Source Workflow Environment for Scientists
Taverna Gui and Enactor Taverna Remote Execution service T-REX Graphical Workbench Drag and drop interface Plug-in architecture Nested Workflows Workflow Enactor Local and remote enactor Implicit iteration over data collections Automation of data flow Logging and data provenance tracking Workflow Enactor Engine 2
Types of Service Many different types can be incorporated into Taverna by providing the URL – No coding Activity WSDL Activity Bury interaction Activity Workflow Enactor Engine Activity
Finding and Curating Services
What do Scientists use Taverna for? Data gathering, annotation and model building Data analysis from distributed tools Data mining and knowledge management Data curation and warehouse population Parameter sweeps and simulation Users from Systems Biology, Proteomics, Sequence analysis, Protein structure prediction, Gene/protein annotation, Microarray data analysis, QTL studies, Chemioinformatics, Medical image analysis, Public Health care epidemiology, Heart model simulation, Phenotype studies, Phylogeny, Statistical analysis, Pharmacogenomics, Text mining Astronomy, Music, Meteorology
Systems Biology: Integration of microarray data onto SBML Models Data analysis Manipulation of SBML models libSBML incorporated into Taverna through the Java API Consumer Peter Li, Doug Kell, University of Manchester
Data Analysis: Pharmacogenomics Association study of Nevirapine-induced skin rash in Thai Population A systemic (bodywide) allergic reaction with a characteristic rash 100 Cases: rash – 100 Cases: no rash controls 10,000 SNP significantly associated with rash Pathway analysis and systems biology Prioritising SNPs Functional studies Diagnostic tools
Taverna in caGrid caGrid Scavenger with semantic/metadata based caGrid service query caGrid workflow for microarray analysis, using caArray, GenePattern and geWorkbench [Ravi Madduri] Orchestrating CaGrid Services in Taverna Wei Tan, Ravi Madduri, Kiran Keshav, Baris E. Suzek, Scott Oster, Ian Foster, Proc IEEE Intl Conf on Web Services (ICWS 2008)
Sharing Experiments Taverna supports the in silico experimental process for individual scientists How do you share your results/experiments/experiences with your Research group Collaborators Scientific community How do you compare your results with others produced by e.g. Kepler / Triana / Trident?
Recycling, Reuse, Repurposing Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis Paul meets Jo. Jo is investigating mouse Whipworm infection. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this. Workflows are protocols
myExperiment Features User Profiles Groups Friends Sharing Tags Developer interface Workflows Credits and Attributions Fine control over privacy Packs Federation Enactment
Just Enough Sharing….And Credit! myExperiment allows you to credit others’ contributions myExperiment allows you to say Who can look at/download or modify your workflow Who can run your workflow Enactment extends accessibility Taverna is for informaticians myExperiment is for informaticians AND laboratory scientists
The myGrid Team
More Information myGrid Taverna myExperiment BioCatalogue Thanks to Carole Goble, David De Roure and Jiten Bhagat for slide contributions