MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, 2012-02-10 Stian.

Slides:



Advertisements
Similar presentations
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Advertisements

David De Roure Social Networking and Workflows in Research.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
David De Roure Manchester Edition. John Taylor There are a number of grid applications being developed and there is a whole raft of computer technologies.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
David De Roure Creating Research Objects that contain collections of data, papers and research workflows.
Repositories: Disruptive Technology or Disrupted Technology? Sandy Payette, Executive Director DORSDL Workshop at ECDL 2008 September 2008.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
David De Roure Repeat, Reuse, Remix, Reproduce, … Reconstructable Research.
WHIP - Workflow Hosted in Portals Kurt Mueller and Andrew Harrison School of Computer Science, Cardiff And Ian Taylor School of Computer Science, Cardiff.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
PLANETS, OPF & SCAPE A summary of the tools from these preservation projects, and where their development is heading.
SHIWA Desktop Cardiff University, Budapest, 3 rd July 2012.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Co-evolution of digital technologies and research methods David De Roure.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
myExperiment: Towards Research Objects David De Roure
Research Data Context Preservation in SCAPE
What can provenance do for me?
Alan Williams, Donal Fellows, Finn Bacall,
An ontology for e-Research
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
Presentation transcript:

myExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian Soiland-Reyes myGrid, University of Manchester

2 Background Workflow-based Science myExperiment - Web 3.0 virtual environment, library and social network for workflows ~5000 registered users ~2200 workflows ~21 different systems Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO, e-Lico, VPH-SHARE, EGI-INSPiRE….

3 Workflow-based Science » Workflows coordinate the execution of services and link together resources. » Data-driven rather than process-driven: «Send output from A to B and C» » Semi-automated computational execution in scientific problem-solving  repeatable, reproducable, reusable » The implementation of a scientific method What is a Scientific Workflow?

Kepler Triana BPEL Taverna Trident Meandre Galaxy

5 Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating whipworm in mouse. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this. Reuse, Recycling, Repurposing

“A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK

 “Facebook for Scientists”...but different to Facebook!  A repository of research methods  A social network of people and things  A Social Virtual Research Environment  A probe into researcher behaviour  Open source (BSD) Ruby on Rails app  REST and SPARQL, Linked Data  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs

10 Search Engine reviews ratings groups friendships tags Enactor files workflows ` HTML RDF Store SPARQL endpoint Managed REST API facebook iGoogle android XML API config mySQL profiles packs credits APIs for developers myExperiment API

11 Taverna integration myExperiment API » myExperiment plugin for Taverna › Browse myExperiment workflows My workflows Tags Search › Open workflow + Embed in existing workflow › Upload workflows Provide metadata

12

Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Published in Workflow 16 Workflow 13 Common pathways QTL Paul’s Pack Paul’s Research Object

Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. The R.* dimensions Replacing the Paper: The Twelve Rs of the e-Research Record” on

 Workflow – pack contains a number of workflows  Presentation - encapsulation of a single presentation  Collection - a number of things: workflows, presentations, papers  Heterogeneous - where the workflows do not appear to have a clear common purpose  Homogeneous - workflows appear to be designed to work together Paper - source for a paper Tutorial - tutorial material Data - collection of data files Derived data - results of workflow Benchmark - benchmarking data Supplementary - stuff associated with a paper Noise - tests, tryouts, rubbish Oddity - none of the above Analysis by Sean Bechhofer Pack analysis

 Workflow Preservation  Research Objects  Provenance  Recommendation  Astronomy and Genomics  Workflow Preservation  Research Objects  Provenance  Recommendation  Astronomy and Genomics

17 » Scientific workflows aim at the heart of experimental science › Enable automation of scientific methods › Encourage best practices » Need to be preserved › Reuse is fundamental for incremental scientific development › Method reproducibility is key for credit and publication » …but workflow preservation is complex › Heterogeneous types of information need to be aggregated, including workflows and related resources into research objects › Research objects need to be trusted and understandable n years from now › Social aspects need to be addressed in order to support reuse in scientific communities Challenges Wf4Ever Preservation of scientific workflows in data-intensive science

18 Stability, Completeness, Integrity, Authenticity, Quality Wf4Ever Workflow Decay Component level flux/decay/unavailability Data level formats/ids/standards Infrastructure level platform/resources Experiment Decay Methodological changes New technologies New resources/components New data

19 Flux Redo Decay at different abstraction levels Workflow Decay

20 Research Objects as Social Objects 20

21 Research objects

22 Discovery, Sharing, Curation, Reusing, Preserving Carriers of Research Context »Citable »Aggregation, Dispersed ›Heterogeneous ›Local and External »Annotated metadata ›Provenance ›Structured: Manifests, Recipes, Permissions, Discourse »Lifecycle ›Publishing, Evolution ›Versioning, Preserving »Mixed Stewardship ›Graceful Degradation »Sharing »Security & Privacy »Stereotypical Profiles »Services Distributed Third Party Tenancy Alien Store Technical Objects Social Objects Encodings: Semantic Web: LOD, VoID, OAI-ORE, AO, SIOC, OPM….

23 Research Object Core (ro) Research Object model

24 Workflow Description (wfdesc) Research Object model

25 Workflow Provenance (wfprov) Research Object model

26 Technical Infrastructure The Wf4Ever Proposal Models Research Object Annotation Provenance Evolution and Versioning Semantic Web Encoding Services Foundational, Extension, User APIs, Architecture Web protocols/services Principles Map into standards Adopt standards Lightweight components Ecosystem Command line Third party systems

27 Foundation Services Extension Services User Clients Services The Wf4Ever Proposal

28 Next steps » Analyse decay within all myExperiment workflows › Estimate: Roughly 50% don’t run correctly › But why? Services gone? Did they ever work? › How have the community evolved those workflows? » Service and wf substitutions; recommendations » Provenance analysis and use › e.g. verifiability, replayability » SHIWA approach – can handle “What if the workflow system stops working”

29 Thank you! Any Questions? This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.