Download presentation
Presentation is loading. Please wait.
Published byLionel Cunningham Modified over 9 years ago
1
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28 Stian Soiland-Reyes myGrid, University of Manchester
2
2 My background myExperiment - Web 3.0 virtual environment, library and social network for workflows ~5000 registered users ~2200 workflows ~21 different systems Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO, e-Lico, VPH-SHARE, EGI-INSPiRE…. http://www.myexperiment.org/ http://www.taverna.org.uk/
3
“A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
4
“Facebook for Scientists”...but different to Facebook! A repository of research methods A social network of people and things A Social Virtual Research Environment A probe into researcher behaviour Open source (BSD) Ruby on Rails app REST and SPARQL, Linked Data Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs http://www.myexperiment.org/
6
http://www.wf4ever-project.org/ Workflow Preservation Research Objects Provenance Recommendation Astronomy and Genomics Workflow Preservation Research Objects Provenance Recommendation Astronomy and Genomics
7
7 » Scientific workflows enable automation of scientific methods and encourage best practices to be shared » Workflows need to be preserved for › Reuse, fundamental for incremental scientific development › Method reproducibility, key for credit and publication » Workflow preservation is complex! » Heterogeneous types of information need to be aggregated, including workflows and related resources forming research objects » Research objects need to be trusted and understandable n years from now » Social aspects need to be addressed in order to support reuse in scientific communities Challenges Wf4Ever Preservation of scientific workflows in data-intensive science
8
Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. The R.* dimensions Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/http://blogs.nature.com/eresearch/
9
9 Forms of decay Wf4Ever Workflow Decay Service decay Flux/decay/unavailability Data decay Formats/ids/standards Infrastructure decay platform/resources Experiment Decay Methodological changes New technologies New resources/components New data
10
10 Preservation, Conservation, Recreating Preserving Archived Record Fixed Snapshots Review Rerun & Replay Conserving Active Instrument Live Rerun & Reuse Repair & Restore Recreating Archived Record Active Instrument Live Rebuild Recycle Repurpose
11
11 http://www.gridworkflow.org/kwfgrid/gwes/docs/ Flux Redo Decay at different abstraction levels Workflow Decay
12
12 Research objects
13
13 Research Objects as Social Objects
14
14 Research Object model core (simplified) http://purl.org/wf4ever/ro# ro:Resource ro:ResearchObject ro:Manifest ro:AggregatedAnnotation ore:aggregates ro:annotatesAggregatedResource wfdesc:Workflow ore:isDescribedBy Note: This figure shows a simplified view of the RO core. RO specification: http://wf4ever.github.com/ro/
15
15 Research Object model core http://purl.org/wf4ever/ro#
16
16 RO model: Workflow Description http://purl.org/wf4ever/wfdesc#
17
17 Workflow Provenance (wfprov) http://purl.org/wf4ever/wfprov#
18
18 Technical infrastructure Models Semantic Web Encoding Research Object Annotation Provenance Evolution and Versioning Services Web APIs, REST services Foundational, Extension, User APIs, Architecture Principles Map into standards Adopt standards Lightweight components Ecosystem Command line Portal Third party systems
19
19 Foundation Services Extension Services User Clients Services The Wf4Ever Proposal
20
20 Lifecycle Services Storage Services Wf4Ever Reference Implementation Access & Usage Clients Data Management & Analysis Services Stability Evaluation Completeness Evaluation Recommender RO PortalRO Manager Tool RO Digital Library ROBox Dropbox Client Prototype, Dec 2011 Taverna Workflow Mgmt System
21
21 Year 1 (Dec 2010 Dec 2011) Roadmap » Exploration (2011) Problem specification and requirements identification Better understanding of workflow preservation needs from the domains (what does it mean to preserve a scientific workflow?) Proofs of concepts Preliminary models, components, and integrated reference implementation Result identification
22
22 Year 2 (Dec 2011 Dec 2012) Roadmap Realization/validation (2012) › Validate the models, architectures and software in practice › Distributed components with different access/security arrangements – forming REST APIs and specifications › RO Content Campaign: Generate 1000s of ROs › First productization phase: Stable releases of models and reference implementation › Decay monitoring and notification (why my wf is no longer stable), reacting to decay, attribution and credit support beyond recommendation. Detailed use of provenance › Execution and interoperability support (SHIWA integration)
23
23 Year 3 (Dec 2012 Dec 2013) Roadmap » Exploitation (2013) › Final productization phase › Deployment in user environments and systems, enhanced with workflow preservation capabilities › RO-enabled myExperiment › RO-enabled Galaxy › RO-enabled dataVerse › … and more! › Deployment in publishers e.g. Elsevier, Digital Science, GigaScience
24
24 Collaborations and impact » SHIWA – Sharing Interoperable Workflows » Publishers/journals: Elsevier, GigaScience (by BGI) » OpenPHACTS (nanopublications) » SCAPE (dataset preservation) » BioVel (biodiversity - species preservation!) » Dataverse (data repository) » Galaxy (workflow system for genomics) » GenomeSpace (data integration platform)
25
25 Thank you! Any Questions? http://www.wf4ever-project.org/ This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.