Responsible Citizenship of the World of Science

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
MDI 2010, Oslo, Norway Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic, Richard Paige The University of York,
Architectures for Data Access Services Practical considerations for design of discoverable, reusable interoperable data sources.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Practical RDF Chapter 1. RDF: An Introduction
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
1 A Historical Perspective on Conceptual Modelling (Based on an article and presentation by Janis Bubenko jr., Royal Institute of Technology, Sweden. June.
The RESEARCH DATA ALLIANCE GEO BON Workgroup 8 WG: Brokering Governance Wim Hugo – ICSU-WDS/ SAEON / GEO BON.
Data Policy and Data Management – Sample Experiences and Requirements Serving society Stimulating innovation Supporting legislation.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Public Access and Spatial Metadata Values: Semantic Network Services Response to EU Directives Maria Rüther Federal Environment Agency,
Data in Context Co-chairs: Brigitte Jörg, Keith Jeffery RDA 3rd Plenary, March, 26th - 28th, 2014 Dublin.
WDS Knowledge Networks Summary of Major Elements.
Why RDA? A domain repository perspective George Alter ICPSR University of Michigan.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
Ideas on Opening Up GEOSS Architecture and Extending AIP-5 Wim Hugo SAEON.
The RESEARCH DATA ALLIANCE WG: Brokering Governance Wim Hugo – ICSU-WDS/ SAEON.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
Introducing SCHOLIX.
Linked Open Data Approaches within the ARIADNE project
Responsible Citizenship of the Data World
South African Research Data Infrastructure
Harmonizing Measurements for Marine Biodiversity Observation Networks
DSA and FAIR: a perfect couple
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Conceptualizing the research world
ELIXIR Core Data Resources and Deposition Databases
Metadata Catalogue and Knowledge Network
GBIF Implementation Plan Highlights
Certification of Trusted Repositories
An Overview of Data-PASS Shared Catalog
Data Type Registries #2 12 Month Status Larry Lannom, Tobias Weigel Date Location TBD? CC BY-SA 4.0.
Data Type Registries Breakout
ACS 2016 Moving research forward with persistent identifiers
Software Design and Architecture
Opening Big Data; in small and large chunks
High-Level Overview SAEON involvement in South African and Global Research Data Infrastructure Asiphe Sahula, Wim Hugo-SAEON AfriGEOSS Symposium 28.
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Introducing the Publishing Data Services WG
Model-Driven Analysis Frameworks for Embedded Systems
Chair of Tech Committee, BetterGrids.org
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
An ecosystem of contributions
Prepared by: Jennifer Saleem Arrigo, Program Manager
NSDL Data Repository (NDR)
Core Data Resources and FAIRification of Data
Session 2: Metadata and Catalogues
Open Science: the crucial importance of metadata
LOD reference architecture
Malte Dreyer – Matthias Razum
RDA cataloguing and linked data
IG Physical Samples and Collections in the Research Data Ecosystem
Bird of Feather Session
NextGRID: From Compute Grids to Grid SOAs and beyond
Graphical Modeling of INFOD applications
The JISC Core Middleware Call
IDEAS Chris Partridge 6/27/2019.
Persistent identifiers for instruments (PIDINST) working group
1st Call for Collaboration Projects
Introduction to reference metadata and quality reporting
Presentation transcript:

Responsible Citizenship of the World of Science Using Persistent, Unique Identifiers for Samples Wim Hugo SAEON, ICSU-WDS

Too Large and Complex to be Useful to Science… The Complete Web: every piece of information at a physical network node is potentially in multiple relationships with every other. This enormous graph is many times larger than the physical internet (1) and is not practically useful for science. Formal Meta-Data: very few links are formally specified, eliminating almost all of the potential links between pieces of information to favour only a very rigid collection. (1) Fensel, D. and van Harmelen, F. (2007). Unifying Reasoning and Search to Web Scale, IEEE Computer Society, 1089-7801/07. http://www.cs.vu.nl/~frankh/postscript/IEEE-IC07.pd f

The LOD “Cloud” 2011

The LOD “Cloud” April 2014

Credibility of Science Access to original and complete data sets for reproducibility Re-usability declines with time Availability declines with age http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000308#pone-0000308-g002 http://www.sciencedirect.com/science/article/pii/S0960982213014000

Rationale Reduction in Complexity of the Semantic Web Citability and Incentivisation Re-usability (Interoperability) Reproducability Discoverability

Solution: Considerations

Charles Babbage (1791-1871)

Sir Robert Peel (1788-1850) The British Parliament, after investing £ 20,000 in the Difference Engine project, was treated to a demo. £ 2,400,000 in 2017 “Can you set the machine to calculate the time at which it will be of some use???”

Information Technology …

Systems Engineering Technology Drivers Patterns Use Cases Design Considerations and Architecture Other (Science) Drivers Solution(s) Implementation

Persistent Identifiers and Science The Fabric of Science The Process of Science The Language of Science

Persistent Identifiers and Science The Fabric of Science The Process of Science The Language of Science Mostly Captured in Metadata Mostly Captured in Metadata Mostly Captured in Data

The Fabric of Science ICSU-WDS Knowledge Network Scholarly Publications (CrossRef?) TDRs (WDS, DSA, DataCite*) Samples and Events People (ORCID, …) RDI Outputs/ Online Resources Coverage (Temporal, Spatial, Topic) Data Citations (DataCite) Institutions (GRID, ISRI) Projects Use, Caveats, Lineage, Provenance, Methods Initiatives Licenses (Creative Commons) Networks Platforms, Instruments, Deployments, Sites, … * Including re3data, DataBib Funders (?) Exists Started Not Now WDS ICSU-WDS Knowledge Network

The Process of Science ICSU-WDS Knowledge Network Sample, Specimen, Member “Standard Variables” Standard Transformation Real World/ Events Processing Observations, Media “Analysis Ready Data” Analysis and Workflows “Publication Ready Output” ICSU-WDS Knowledge Network

The Language of Scientific Data Unstructured Normalised Graph Ontology (Resolution uncertainty) Time Vocabularies (“stable”) Spatial/ Location Coverage (Temporal, Spatial, Topic) Registries (“unstable”) Dimensions (External Entities) Topic Variable Non-Standardised “Controlled” Variables

Framework: Elements of a Solution Domain-Specific or Community-Specific PIDs Governance Process Framework Technical Guidance Conceptual Framework and Metamodels End Users and Systems Certification

Design Consideration #1 Precision, Vocabularies, PIDs, and LOD Precision is Critical in Formal, Structured Data if used for dimensions Precision is desirable for other data Graph edges have a weight distribution … assertion Creativity is impossible if precision is perfect  Guidance on required on best practices Value judgment on usability/ trust Some progress in this symposium Technical Guidance

Design Consideration #2 Single Point of Resolution Option 1 – voluntary/ community indexing Option 2 – dedicated resolver Option 3 – ‘publication’ metadata Agreement on minimum index metadata Governance and sustainability No progress thus far Governance | Certification

Option 1 - Hybrid Graph Solution: “Scholix” Scholarly Publications (CrossRef?) TDRs (WDS, DSA, DataCite*) Samples and Events People (ORCID, …) RDI Outputs/ Online Resources Coverage (Temporal, Spatial, Topic) Data Citations (DataCite) Institutions (GRID, ISRI…) Projects Use, Caveats, Lineage, Methods Initiatives Licenses (Creative Commons) Networks Platforms, Instruments, Deployments, Sites, … * Including re3data, DataBib Funders (?) Exists Started Not Now WDS http://www.scholix.org/

The Fabric of Science ICSU-WDS Knowledge Network Scholarly Publications (CrossRef?) TDRs (WDS, DSA, DataCite*) Samples and Events People (ORCID, …) RDI Outputs/ Online Resources Coverage (Temporal, Spatial, Topic) Data Citations (DataCite) Institutions (GRID, ISRI, …) Projects Use, Caveats, Lineage, Methods Initiatives Licenses (Creative Commons) Networks Platforms, Instruments, Deployments, Sites, … * Including re3data, DataBib Funders (?) Exists Started Not Now WDS ICSU-WDS Knowledge Network

Design Consideration #3 Single Point(s) of Failure Persistent Identifiers are now fundamental to research data infrastructure and cannot be allowed to fail. Yet may of these services are community-driven, poorly funded, and sometimes rely on voluntary contributions to function Identify critical elements of RDI and fund this reliably and into the long term Certify services and manage to maturity Some progress in this symposium Governance | Certification

Design Consideration #4 Conceptual Models and Semantics Conceptual Model – “Because it is a sample” Conceptual Model – “Because of the domain” Conceptual Model – “Because of the owner” Perspective from Statistics Develop a Conceptual Model that allows Protocol-specific, Organisation-Specific, and Domain-specific metadata Influences resolution and higher-level sample metadata Hopefully constructed from existing metadata standards without major modification Good progress in this symposium Conceptual/ Metamodel

A Bit of DataCite Schema RelatedIdentifier Suggestion to add sample-related vocabulary Supplementary Materials Suggestion to add ‘Sample Metadata’ as a link Keyword/ Subject Element Infinitely extensible

Design Consideration #5 Definition of Critical Dimensions for Data Families Sample Identifier is Often Required and Must Become Common Practice in Datasets Vocabulary or PID for each Dimension Develop a Conceptual Model for each Data Family and Discipline/ Standard Variable Influences metadata and data content standards No progress thus far Best Practice/ Technical Guidance

Generic Dimensions of Data Sample S Spatial Coverage XYZ Temporal Coverage: T Topic or Semantic/ Ontological Coverage D: Demographic P: Phenomenon mostly physical, chemical, or other contextual data B: Biological Tx: Species and Taxonomy (with some extensions) Al: Allele/ Genome/ Phylogenetic Ch: Characteristics, Traits, and, and Life Stages Each unique combination of these, supported by a vocabularies/ ontology is a generic data family Continuous or Near-Continuous: Uppercase Discrete or dispersed: Lowercase Best Practice/ Technical Guidance

Some Generic Data Families and Crosswalk Requirements Typical Dimensions/ Content Typical Infrastructure Typical Syntax/ Schema Object xyz, t, P/C, S DDI “Sparse” NetCDF XYZ, T, P, S OPeNDAP Multi-dimensional S-DB Traditional Spatial XYz, t, P, S WxS O&M Signals XYZ, T, P/ B, S SensorThings General Structured, Media, Objects Object xyz, t, P/ B, S CSV, PDF, ZIP GBIF Index XYz, t, Tx, S Species Occurrence, … DwC GenBank XYZ, T, Al, S Genetic FTP/ ASN.1 Now Implementing: ✪ Array Databases/ Virtual Cubes for Everything WCS

Simple or Core Information Model Genes and Alleles Species and Taxons Sampling Event Spatial and Temporal Coverage Life Stages, Traits and Characters Physical Phenomena Best Practice/ Technical Guidance

Example: Taxon Abundance, Presence and Absence Genes and Alleles Relationship Species and Taxons Sampling Event Spatial and Temporal Coverage Life Stages, Traits and Characters Physical Phenomena Best Practice/ Technical Guidance

Example: Phylogenetic Data Genes and Alleles Relationship Species and Taxons Sampling Event Spatial and Temporal Coverage Life Stages, Traits and Characters Physical Phenomena Best Practice/ Technical Guidance

Example: Morphology Best Practice/ Technical Guidance Genes and Alleles Relationship Species and Taxons Sampling Event Spatial and Temporal Coverage Life Stages, Traits and Characters Physical Phenomena Best Practice/ Technical Guidance

Example: Biome Definition, Ecosystem Services Genes and Alleles Relationship Species and Taxons Sampling Event Spatial and Temporal Coverage Life Stages, Traits and Characters Physical Phenomena Best Practice/ Technical Guidance

Design Consideration #6 Identifiers for the Fabric of Science Certification of the Process of Science PIDs for All Important Agents/ Objects Repositories for Objects and Artifacts Registries and Name Services World Data System/ DSA considering extension beyond data Community Assessments – Rating-driven (TripAdvisor, …) Comment-driven (GitHub, ...) Good progress in this symposium Certification

Design Consideration #7 Solution Granularity Too specialised: reduced utility and much duplication Too generalised: all communities miss critical aspects Two-tiered solution? Some progress in this symposium Certification

“Granularity” of a Solution Universalise? Generalise Specialise “change the code” “change the configuration” “change the reference model”

“Granularity” of a Solution

Actions Who takes this forward? RDA Interest Group WG: Sample – Conceptual Model/ Metadata WG: Develop Two-tiered Solution? WG: Criteria for Trusted Name Services and Registries WG: Best Practices in Respect of Sample Management ICSU-WDS/ DSA/ Institutions (IGSN, Pangaea, DataCite, …) Develop criteria for trusted sample repositories Implement certification infrastructure Implement Scholix Generalisation – institution required

“Metadata is a tax in the Data World “Metadata is a tax in the Data World. R They may not like it, but responsible citizens pay their taxes. Citizens cannot expect services and infrastructure to be built on their behalf if they do not pay tax.”

?