Data Discovery Paradigms Interest Group 4 April, 2019 RDA 13th Plenary Meeting, Philadelphia Siri Jodha Singh Khalsa Fotis Psomopoulos Mingfang Wu.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

VCC3 Proposal Organisation of the tasks Sophie David, Jean-Luc Minel 28 th -29 th August 2012, Dublin.
IPlant Data Commons. iPlant Data Commons leverages all elements of our CI to enhance data management, discoverability, and reuse.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
METADATA QUALITY IN EUROPEANA , Den Haag.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
Towards Data Management Principles (report of progress of the Task Force on Data Management Principles) Alessandro Annoni European Commission Joint Research.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Hydro DWG at the RDA Plenary BoF - Improve sharing of water resource data globally 24 September BREAKOUT :30-15:00.
Report of the Architecture and Data Committee (ADC) R.Shibasaki (ADC, Japan)
Breakout Session 2.2: A sustainable GEO Information System of Systems Chair: Lorenzo Bigagli Rapporteur: Greg Yetman.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
What do Open APIs mean for our organization
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
The RESEARCH DATA ALLIANCE WG: Brokering Governance Wim Hugo – ICSU-WDS/ SAEON.
MICHAEL and the European Digital Library: promoting teaching, learning and research The MICHAEL Project is funded under the European Commission eTEN Programme.
Data Foundations And Terminology (DFT) IG
CESSDA SaW Training on Trust, Identifying Demand & Networking
Paul Eglitis [IEEE] and Siri Jodha S. Khalsa [IEEE]
User Characterization in Search Personalization
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
WHY? - Found initiative while case statement preparation
Justin Buck OceanSITES data Incentives for participation: Data citation & data services Justin Buck
Wheat Data Interoperability Esther DZALE YEUMO KABORE Richard FULSS
Toward FAIR Semantic Resources
OzNome 5-star Tool: A Rating System for making data FAIR and Trustable
General Finnish DMP Guidance
EOSChub / OpenAIRE-Advance
Data Discovery Paradigms Interest Group Report on Activities and Outputs Anita de Waard, Siri Jodha Singh Khalsa Fotis Psomopoulis Mingfang Wu.
Pilot project training
RDA/TDWG Metadata Standards for Attribution of Physical and Digital Collections Stewardship Anne E Thessen, Matt Woodburn, Dimitris Koureas 21 Sept, 2017/Montreal,
INSPIRE Geoportal Thematic Views Application
Data Management: Documentation & Metadata
EOSCpilot Skills Landscape & Framework
EOSC Governance Development Forum
Metadata for research outputs management
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
EOSC services architecture
The JISC IE Metadata Schema Registry
An ecosystem of contributions
Geospatial and Problem Specific Semantics Danielle Forsyth, CEO and Co-Founder Thetus Corporation 20 June, 2006.
European Open Science Cloud All Hands Meeting Pisa 8-9 March 2018
6.2 data interoperability Rafael C Jimenez ELIXIR
2. An overview of SDMX (What is SDMX? Part I)
Accommodating local cataloguing traditions in a global context
Tech introduction.
Session 2: Metadata and Catalogues
Google Dataset Search Evaluation
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Antoine Isaac SEMIC conference
Bird of Feather Session
Hans Dufourmont Eurostat Unit E4 – Structural Funds
MSDI training courses feedback MSDIWG10 March 2019 Busan
Data Discovery Paradigms
Brokering as a Core Element of EarthCube’s Cyberinfrastructure
Hans Dufourmont Eurostat Unit E4 – Structural Funds
Joint Metadata Session Alex Ball, Keith Jeffery, Rebecca Koskela
A Research Data Catalogue supporting Blue Growth: the BlueBRIDGE case
GEO Knowledge Hub: overview
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
EOSC-hub Contribution to the EOSC WGs
Australian and New Zealand Metadata Working Group
Presentation transcript:

Data Discovery Paradigms Interest Group 4 April, 2019 RDA 13th Plenary Meeting, Philadelphia Siri Jodha Singh Khalsa Fotis Psomopoulos Mingfang Wu

Charter Data Discovery Paradigms Interest Group: Motivation: Helping to make research data Findable to support users in discovering data regardless of the manner in which it is stored, described and exposed. Explore shared issues among those who search for data, and those who build and operate systems that enable data search. Goals: Provide a forum where representatives across the spectrum of stakeholders and roles can explore how to improve data discovery. Produce actionable recommendations for data producers, data repositories and data seekers.

Ongoing TF related to metadata Metadata enrichment To describe and catalog various efforts to enrich research data metadata sets to satisfy several use cases Data discovery on granularity Metadata should be created in different level of granularity to support granular levels of access Using schema.org for research dataset discovery

Schema.org The schemas are a set of 'types', each associated with a set of properties. The types are arranged in a hierarchy. Schema.org provides a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engines: Google, Microsoft, Yandex and Yahoo! https://w3c.github.io/dxwg/dcat/#dcat-sdo

Task force objectives Common elements across research domains Objective 1 - Define research schemas types and minimum information guidelines for discoverability and accessibility Objective 2 - Crosswalk and gap analysis evaluating existing standards and guidelines Domain specific elements Objective 3 - Review existing efforts working on Schemas to describe research types Objective 4 - Engagement and communication strategy; collaboration and with existing efforts

Survey: Objective This survey will gather information on existing work involving schemas to describe research data and related resources. Analysis of the survey results will help repositories and the proposed working group understand current practices, identify commonalities, gaps and barriers in using schemas for describing and discovering research datasets. It is envisaged that the survey results can inform the work group in planning its objectives and deliverables, along with sharing practices between data repositories.

Survey: current practices in using Schemas to describe research datasets Repository/Catalogue profile Organisation name, URL of catalogue, domain covered, metadata schema(s) Current status of applying schema.org Mapping from/to schema.org, mapping between other (non schema.org) schemas, the way schema.org is being applied Issue identification Missing resource type, property, or relation property Suggestions to the research schemas ‘working group’

Survey: Catalogue profile 20 participations, dated on 25/03/2019 DCAT-AP, DCAT DataCite Schema.org ISO2146 DataCite + Dublin Core Other: Biogeochemical Dynamics, Culture Heritage EML DATS DDI Types of metadata schemas (19/20) Domains covered by participating catalogues (20/20)

Survey: Current status of applying schema.org Mapping from/to schema.org (15/19) EML -> schema.org B2FIND <-> schema.org ISO2146/RIF-CS -> schema.org Dataverse -> schema.org CATS -> schema.org HCLS -> schema.org DCAT-AP -> schema.org Ways of applying schema.org (16/19) Mark up of landing page in JSON-LD (8) Metadata schema (6) Possibly complementing (3)

Survey: Missing resource type/relation property/property Scientific measurement Environmental entities Data services / APIs Tissue samples Data access arrangements Data reuse conditions/consent Data Controller (legal frameworks) Performances Digital artefacts Some from DataCite ResourceTypeGeneral, e.g. DataPaper, Model, Workflow Relation property Dataset -> FundingAward Dataset -> Cruise (Event) Dataset -> Funder Study -> Study design Many from DataCite <relationType>, e.g. IsCitedBy, HasVersion, IsNewVersionOf, ... Issues: Mapping multiple relation types into one Not sure if predicates (e.g. in the OBO Foundry Relation Ontology (RO), EnvO, and SWEET) are expressible Property Keyword -> external vocabulary (e.g DefinedTerm, CategoryCode) Controlled vocabuary from DataCite <dateType> , e.g. Accepted, Available, Copyrighted, Updated, etc. Some semantic difference, e.g. schema:Dataset:name, DataCite:Author:name Specific term to generic term, e.g. dct:provenance to schema:description

Survey: Feedback (1) Why is schema.org meanwhile the one and only target metadata schema? Unfettered schema heterogeneity will hinder interoperability, so some mention of harmonisation strategies to be provided in the (still-to-be-developed) RDA Guidelines would be useful if we are hoping for: close data compatibility if not integratability. Schema.org (for metadata) rarely uses or allows for the use of controlled terminology (i.e. semantics) in the data values, and it does not make any effort to establish constraints, which result in a large set of metadata not actionable, validatable, or interoperable. Schema.org is not supporting some basic metadata information which is common across domains. This includes controlled vocabularies/thesauri/code lists

Survey: Feedback (2) Developing/describing use cases and examples One valuable contribution would be to either provide a list of well-tested software for “collecting” semantic assets (vocabularies, schemas, ontologies …) or host such a repository themselves Being able to document consistently “semantic assets” would be fundamental for enabling interoperability via re-use. Would like to add Arts & Humanities in addition to scientific types. The ‘problem’ of sensitive data and managed access datasets needs to be tackled by any group attempting to provide a comprehensive metadata framework.

Bioschemas profiles: types in context Leyla Garcia: Bioschemas

github: ESIPFed/science-on-schema.org Adam Shepherd: Science-on-schema Tailored to Geoscience and Data - proper citation of DOIs (knowing the Datacite and Crossref are also generating schema.org) how do we help repositories publish in ways that make it clear to harvesters such as Google that the DOI for this Dataset is same DOI found at Datacite.

Thank you ...