Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.

Slides:



Advertisements
Similar presentations
David De Roure Social Networking and Workflows in Research.
Advertisements

Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
University of Southampton, U.K.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Data Management: Documentation & Metadata Types of Documentation.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
R utgers C ommunity R epository RU CORE 1 Research Data and Context  Presentation Goals  The challenge of context  Metadata design to support context.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
The Information Environment for Neuroscientists David R Newman
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
MyExperiment 2.0 – Preserving digital Research Objects using the Wf4Ever architecture EGI/SHIWA Workshops on e-Science Workflows Budapest, Stian.
Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean Bechhofer myGrid, University of Manchester Open Annotation Rollout, Manchester,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Experimental Context, Publishing and Research Objects Brian Matthews STFC.
W ORKFLOW -C ENTRIC R ESEARCH O BJECTS : F IRST C LASS C ITIZENS IN S CHOLARLY D ISCOURSE Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Scientific Literature and Communication Unit 3- Investigative Biology b) Scientific literature and communication.
The Information Environment for Neuroscientists David R Newman
Open Exeter Project Team
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
ECA 2010, Geneva, Switzerland Creating a synergy between BPM
EOSC MODEL Pasquale Pagano CNR - ISTI
Cloud based linked data platform for Structural Engineering Experiment
Paolo Budroni, University of Vienna
Professor Carole Goble University of Manchester, UK
Making “Open Data” Work: Challenges for Data Integration in Genomics Research
Research Data Context Preservation in SCAPE
Joseph JaJa, Mike Smorul, and Sangchul Song
Active Data Management in Space 20m DG
What can provenance do for me?
Jenn Riley Metadata Librarian Digital Library Program
Alan Williams, Donal Fellows, Finn Bacall,
SMART GROUND platform overview
Horizon 2020: Open data pilots and lessons learnt
Chair of Tech Committee, BetterGrids.org
Gibraltar Financial Services Commission
Introduction to Research Data Management
NSDL Data Repository (NDR)
An ontology for e-Research
Capturing and Organizing Scientific Annotations
Social media for global scientific community – Mendeley project
Research Infrastructures: Ensuring trust and quality of data
What is Science? Review This slide show will present a question, followed by a slide with an acceptable answer. For some questions, there is a definite.
Datasets in CRM Site Proposal
Metadata The metadata contains
Bird of Feather Session
Jenn Riley Metadata Librarian Digital Library Program
Classifications and Linked Open Data Formalizing the structure and content of statistical classifications Item 9.1 Standards Working Group Luxembourg,
Presentation transcript:

Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI meet-up Manchester

2 Agenda » Preserving digital science » The Research Object » Anatomy » Lifecycle » Wf4Ever Tools » Future developments

3 Computation Processes in Today’s Research » Research is being conducted in increasingly digital and online environment » This has led to the emergence of new digital artifacts » In some respects, these objects can be regarded as data » However, some objects include the description of the research method that is captured as a computational process » Such processes encapsulate the knowledge related to the generation, (re)use and general transformation of data in experimental sciences Raw data Computational process Results

4 Scientific Workflow »A scientific workflow is a precise, executable description of a scientific procedure - a series of analysis operations connected using data links »Each operation represents the execution of a computational process »Can be supplied by independently developed web services »Can also use existing data sources that are accessible on the Web In this work, we focus on a particular kind of computational processes called scientific workflows

5 Preservation Challenges »Changes by 3 rd parties »Workflow may produce different lists at different times »Workflow may become inoperable Challenges deal with their executable aspects and their vulnerability to the volatility of the resources required for their execution »Workflow decay – The execution of the workflow may fail or yield different results, due to dependencies on resources and services subject to independent changes, e.g., EMBL-EBI. Even workflows that depend on local resources are vulnerable.

Laboratory Instruments Methods Materials Publication Models, Techniques, Algorithms Data Laboratory Instruments Methods Materials Provenance Attribution Credit Provenance Attribution Credit Context Investigation Study Experiment Context Investigation Study Experiment Replicate / Repeat Exactly replicate the original experiment and experimental conditions. Eliminate change. Observe. Reproduce Run experiment with differences in experimental conditions.. Compare to test for same result. Observe. Capture Curate Discover Use Reuse Preserve Reproduce Between Labs Repeat Within Lab

RO Architecture is Hourglass ROs structured packages Provenance, Versioning, Mim services Viewing, collaboration services/protocols Astronomy, Biology, services/protocols Exchange services (media specific) Storage services (media specific)

8 Research Object Datasets Results Scientists Hypothesis Experiments Annotations Provenance Electronic paper Workflows From Electronic papers to Research objects

9

10 Research Object: A user scenario

11 Why research objects?  A research object aggregates all elements deemed necessary to understand research investigations  Promote reuse, sharing  Enable the verification of reproducibility of the results  Trackable, versionable, referenceable

12 Anatomy of a research object ro:Resource ro:ResearchObject ro:Manifest ore:aggregates ore:describes ro:Folder ro:FolderEntry ore:proxyFor ore:proxyIn Subclass of ro:SemanticAnnotation ore:aggregates ro:annotatesAggregatedResource RDF file ao:body

Grounding Workflow-centric Research Objects Using Semantic Technologies  Workflow-centric research objects are encoded using RDF, according to a set of ontologies that are publicly available  Research objects extend the Object Exchange and Reuse (ORE) model, to represent aggregation. 13 ORE

 We use the Annotation Ontology (AO) to annotate research object resources and their relationships. 14 Grounding Workflow-centric Research Objects Using Semantic Technologies

15 Relating resources in research object Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Published in Workflow_16 Workflow_13 Common pathways QTL The provenance of the RO elements is key to understanding, comparing and debugging scientific workflows and to verifying the validity of a claim made within the context of a RO

16 Scientist Live RO RO snapshot > Identified by a URI Some metadata Some curation Mostly private (for my group) RO snapshot > Identified by a URI Some metadata Some curation Mostly private (for my group and for paper reviewers) Librarian/Curator Scientist My supervisor calls me to report my work My supervisor calls me again and we decide to publish our RO+paper > Archived RO > Identified by a URI Good metadata and curation Mostly public Reviews received and final version published > A new PhD student continues my work > Evolution of a research object

17 PROV standard - Basis for evolution model Candidate Recommendation

18 Customizable preservability checklists Wf4Ever Tools

19 Portal: Browsing and annotating Wf4Ever Tools

20 Command line tools, Client libraries Wf4Ever Tools

21 Specifications and APIs Wf4Ever Tools

22 Current Status and Ongoing Work 22 [3]  Models/spec v0.1 public: - Upcoming revision v0.2: (Q1 2013) Minor additions to workflow model terms “RO Terms” – Upper user level view of RO: hypothesis, results – many are “shortcuts” for structured model - TODO: Update annotation model to Open Annotation Data Model (OAC) - TODO: PAV for detailed authorship provenance  Showing, managing and sharing of Research Objects through myExperiment web site

23 Open Annotation Data Model “Almost final” spec: Roll out meeting in Manchester: March 2013 Community Draft

24 myExperiment RO support

Thank you! /