Download presentation
Presentation is loading. Please wait.
Published byAshley Simon Modified over 7 years ago
1
Research Objects for improved sharing and reproducibility Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology Oscar Corcho @ocorcho, Ontology Engineering Group Universidad Politécnica de Madrid (and the Research Object community group)
2
My motivation
3
Some memos from our futuristic scenario
Don’t publish, release (ack: Carole Goble), reloaded (ack. Paul Groth) Don’t just read a paper, but also view it, play with it, and whatever else Convert passive papers into active scientific storytellers and alert systems
4
A few quotes from this week
Data (and method) sharing Dietrich: The method for investigation is not clearly described Eric: Provide links between articles and datasets (interlinking of scholarly content) William: methods are normally reduced to a tiny piece of text Reproducibility Working group on “the present”: Crisis of replicability is driving increased concern and interest Eric: 70% of science articles are not reproducible
5
Act 1 Data and method sharing
6
One of the many origins of “Don’t Publish, Release”
A day in Granada… (January, 2012) Let’s get some of the interesting discussions on the Force11 Dagstuhl meeting into practice
7
One of the origins of “Don’t Publish, Release”
Live RO Live RO Scientist My supervisor calls me to report my work My supervisor calls me again and we decide to publish our RO+paper Reviews received and final version published A new PhD student continues my work <<copy>> <<copy>> <<copy, filter and curate>> <<copy>> <<versionOf>> Scientist RO snapshot RO snapshot <<versionOf>> We will now illustrate research object lifecycle through a small example that shows how all the resources contained in a research object are bundled as the scientific experiment progresses. This example lifecycle is summarized graphically on the slide. A research object normally starts its life as an empty Live Research Object, with a first design of the experiments to be performed (which determines what workflows and resources will be added, by either retrieving them from an existing platform or creating them from scratch). Then the research object is filled incrementally by aggregating such workflows that are being created, reused or re-purposed, datasets, documents, etc. Any of these components can be changed at any point in time, removed, etc. In our scenario, we observe several points in time when this Live Research Object gets copied and kept into a Research Object snapshot, which aims to reflect the status of the research object at a given point in time. Such a snap- shot may be useful to release the current version of the research outcome of an experiment, submit it to be peer reviewed or to be published (with the appro- priate access control mechanisms), share it with supervisors or collaborators, or for acknowledgement and citation purposes. A snapshot may also contain a paper describing the research object in general and the experiment in particular, depending on the policies of the corresponding scientific communication channel, e.g., workshop, conference or journal. Such snapshots have their own identifiers, and may even be preserved, since it may be useful to be able to track the evolution of the research object over time, so as to allow, for example, retrieval of a previous state of the research object, reporting to funding agencies the evolution of the research conducted, etc. At some point in time, the research object may get published and archived, in what we know as an Archived Research Object, with a permanent identifier. Such a version of our research object may be the result of copying completely our Live Research Object, or it may be the result of some filtering or curation process where only some parts of the information available in the aggregation are actually published for others to reuse. As illustrated in Figure 4, a user can use an existing Archived Research Object as a starting point to his or her research, e.g., to repurpose it or its parts, in which case a new Live Research Object is created based on the existing Archived Research Object. This is only one of the many potential scenarios that could be foreseen for the lifecycle of a workflow-centric research object and we are currently defining different storyboards for their evolution. One important aspect to highlight is the fact that during its whole lifecycle, the research object is aggregating new ob- jects. The annotation process during the lifecycle of experimentation allows the generation of sufficient metadata about the research objects to support preser- vation and sharing. Therefore, when a scientists decides to preserve it most of the annotations that will be needed for that preservation process will be already available inside the research object. Identified by a URI Some metadata Some curation Mostly private (for my group) Identified by a URI Some metadata Some curation Mostly private (for my group and for paper reviewers) Identified by a URI Good metadata and curation Mostly public Librarian/Curator Archived RO
8
How do you usually structure your experiment?
In a set of folders? These could be profiles for how you normally structure your research Dropbox? Google Drive? GitHub? Overleaf+figshare? Whatever???
9
Scattered Assets Slideshare Community db Github figshare Arxiv.org
10
A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange. Research Objects Multi-various products, platforms, resources First class citizens - id, manage, credit, track, profile, focus
11
RO main principles Identity Metadata Description Aggregation
Refer to aggregations and their contents Identity Interpretation: The objects How they are linked together Metadata Description Describe group & constituents External ids Local files Aggregation Attribution: Who , when, where, why? manifest
12
RO main principles: technologies
DOIs URIs Handles ORCID Identity persistence and resolution, Names Citation Identity W3C OADM OAI- ORE Annotation first class and stand-off Annotation Aggregations Resource maps Proxies Aggregation Packaging – physical and logical containers Open Archives Initiation Object Reuse and Exchange (OAI ORE) is a standard for describing aggregations of web resources Uses a Resource Map to describe the aggregated resources Proxies allow for statements about the resources within the aggregation Capturing context and viewpoints Several concrete serialisations RDF/XML, Atom, RDFa Open Annotation specification is a community developed data model for annotation of web resources Developed by the W3C Open Annotation Community Group Allows for “stand-off” annotations Annotation as a first class citizen Developed to fit with Web Architecture Point of extendability manifest
13
RO principles Use unique identifiers as names for things Use some mechanism of aggregation to group things together Provide metadata about those things & how they relate to each other.
14
RO Model Ontology Defines core concepts of research objects, identity, aggregation, annotation. Used in the manifest
15
RO Model Ontology
16
Manifest – remote and local
on my machine
17
https://researchobject.github.io/specifications/bundle/
Export, archive, publish and transfer ROs. File format for storage and distribution of ROs as a ZIP archive Includes an RO’s manifest, annotations and some or all of its aggregated resources Basis for more specific file formats Backwards compatible: its zip Programmatic access: JSON and JSON-LD manifest, API Capture a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects. doi: /zenodo.10440
18
https://researchobject.github.io/specifications/bundle/
Capture a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects. So not everyone have access to set up a RESTful semantic web servers, in particular we’ve run into this with desktop applications – users just want to save files and then they decide where they are stored. So we decided to write a serialization format for Research Object, which we call the RO Bundle. We wanted this to be accessible for application developers, so we’ve adopted ZIP and JSON, and in a way this would let you create research objects and make annotations without ever seing any RDF. doi: /zenodo.10440
19
Containers
20
Research Objects: Scopes and Tooling
Farr Commons: ISA and FAIR-DOM SEEK COMBINE BagIt (soon) White-labelled sci-domain-independent software Core Ontologies and extensions RO managers/APIs/bundling (Ruby, Java, Python) Latex2RO LDP4RO
21
Publishing may be as easy as…
Providing the URL of the Research Object to the publisher, with a release tag, to start the review process (if extra review needed)
22
Act 2 Reproducibility
23
Terminology Preservation Keep it in a perfect/unaltered condition.
Preserving the integrity and authenticity. Conservation Action of prolonging the existence of significant objects. Researching, recording and retaining all information related to the object. Documenting Restoration Return something to an earlier condition Reconstruction Forming again, with improvements or removal of defects “Two opposing factions had emerged within the environmental movement by the early 20th century: the conservationists and the preservationists. The conservationists (such as Gifford Pinchot) focused on the proper use of nature, whereas the preservationists sought the protection of nature from use.[9] Put another way, conservation sought to regulate human use while preservation sought to eliminate human impact altogether.” Inspired by [Goble, 2012]
24
Terminology Preservation Inspired by [Goble, 2012]
25
Terminology Preservation Conservation Inspired by [Goble, 2012]
26
Terminology Preservation Restoration Conservation
Inspired by [Goble, 2012]
27
Terminology Preservation Restoration Conservation Reconstruction
Inspired by [Goble, 2012]
28
The Research Method in different disciplines
IN VIVO/VITRO INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT IN SILICO
29
The Research Method in different disciplines
Laboratory Protocol (recipe) Experiment Lab book Workflow Digital Log This is the What: detect common groups of tasks. vs How: exact and inexact FGM techniques vs Why? T. 29
30
The Research Method in different disciplines
IN VIVO/VITRO INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT IN SILICO
31
Some problems in lab protocols
Incubate the centrifuge tubes in a water bath. Incubate the samples for 5 min with gentle shaking. Rinse DNA briefly in 1-2 ml of wash. Incubate at -20C overnight. some of them present insufficient granularity, the instructions can be imprecise or ambiguous due to the use of natural language.
32
Currently… How to formalize the information from laboratory protocols as a knowledge base? Semi-structured information Ontologies + NLP tools Unstructured information
33
SMART Protocols - document
Rhetorical and structural components (e.g. introduction, materials, and methods); Information like application of the protocol, advantages and limitations, list of reagents, critical steps.
34
SMART Protocols - wf Representation of the workflow aspects in protocols implicit order in the instructions, following the input output structure.
35
SMART Protocols documentation
SMART Protocols ontology is available here: Giraldo O, García-Castro A, Corcho O. SMART Protocols: SeMAntic RepresenTation for Experimental Protocols. LISC2014 The ontologies are available here and recently were accepted a paper in the workshop linked science 2014 where is describing the ontology design. So far, we have covered a way about how to report formally a lab protocol.
36
SMART Protocols in action
rdf:type sp:title of the protocol rdf:type sp:author entry sp:hasTitle sp:hasAuthor sp:experimental protocol sp:DNA extraction protocol owl:subClassOf ro:partOf rdf:type sp:advantages ro:partOf sp:sample rdf:type ro:partOf sp:application of the protocol rdf:type sp= smart protocols, ro= relation ontology
37
SMART Protocols in action
38
The Research Method in different disciplines
IN VIVO/VITRO INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT IN SILICO
39
Vocabularies and methodologies for representing and publishing workflows
Workflow Provenance Workflow Plan Methodology for workflow publishing Interactive Browsing (Pubby frontend) Programatic access (external apps) Wings workflow generation OPM/PROV conversion Publication Share Reuse Core Portal WINGS on local laptop Workflow Template Workflow Instance PROV export WINGS on shared host WINGS on web server Linked Data Publication Users Other workflow environments RDF TripleStore Repository of linked workflows: Daniel Garijo and Yolanda Gil A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012. 39
40
Definition of workflow abstractions
Catalog of common independent workflow abstractions (motifs) Data-oriented motifs: What kind of manipulations does the workflow have? Workflow-oriented motifs: How does the workflow perform its operations Analysis from 260 different workflows from 10 domains analyzed belonging to 5 different workflow systems Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 40
41
Finding and evaluating common abstractions
Graph mining techniques Workflow fragment Filtering techniques Workflow fragment representation and linkage Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014 41
42
How to preserve Workflows/Research Objects?
Three main ways/levels: Descriptive reproducibility Documentation Workflow execution reproducibility Can we run the workflow? Workflow results reproducibility Can we get the same results? Checklists! Corcho et al: Checklist for workflow conservation. 40 different aspects Goals Results Metadata Corcho et al: Checklist for a workflow conservation plan Based on the DCC’s data management plan
43
Some examples Levels of reproducibility Workflow conservation Plan
44
The Research Method in different disciplines
IN VIVO/VITRO INPUT DATA SCIENTIFIC PROCEDURE EQUIPMENT IN SILICO
45
Reproducibility of Computational Scientific Experiments
SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT FORMER EQUIPMENT ANNOTATE REPRODUCE CLOUD Dispel4Py Internal Extinction Seismic Cross Correlation Pegasus Montage SoyKB Epigenomics Makeflow Blast
46
Our Approach to Experiment Conservation
WICUS Framework overview This is an overview of the system we propose. WICUS stands for Workflow Infrastructure Conservation Using Semantics…
47
Pegasus Montage Workflow
Some results Pegasus Montage Workflow Astronomy workflow Construct large image mosaics of the sky Montage Software distribution 59 binaries Target IaaS Cloud Providers Amazon EC2 & Futuregrid Vagrant RO available at
48
Lessons learned for Anna
Research Objects as a concept Identity, annotation, aggregation Adapted to the tools/infrastructure for each domain With some tooling available already It’s not just data preservation but also methods Lab protocols Computational workflows Understand what reproducibility means for you
49
Research Objects for improved sharing and reproducibility Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology Oscar Corcho @ocorcho, Ontology Engineering Group Universidad Politécnica de Madrid (and the Research Object community group)
50
The Semantic e-Science team at UPM
Acknowledgements The Semantic e-Science team at UPM Carlos Badenes Daniel Garijo Olga Giraldo Rafael González-Cabero Idafen Santana The Wf4Ever team Carole Goble, José Manuel Gómez Pérez, Raúl Palma, Jun Zhao, Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera The Research Object community group
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.