Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd.

Slides:



Advertisements
Similar presentations
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Advertisements

Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Identity and Access Management IAM A Preview. 2 Goal To design and implement an identity and access management (IAM) middleware infrastructure that –
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Presented by DOI Create: TERN as a use-case Siddeswara Guru
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Digital Object Architecture
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
VIVO and Scholarly Repositories: Synergistic Opportunities.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Summary of RDA Outputs so far dr. Ir. Herman Stehouwer 22 September 2015.
Introduction to the Semantic Web and Linked Data
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Hydro DWG at the RDA Plenary BoF - Improve sharing of water resource data globally 24 September BREAKOUT :30-15:00.
GeoLink Overview Goal: Develop Semantic Web technologies that facilitate discovery (and reuse) of geoscience data.Goal: Develop Semantic Web technologies.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
RDA/US Adoption Seed Projects RDA/US is partnering with four groups as part of the MacArthur 2016 Adoption Seeds program Bringing visibility to food security.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
ICSU-WDS & RDA Data Publication Services WG. 2 Linking Research Data and the Literature: why? Why link? 1.Increase visibility & discoverability of research.
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
The RESEARCH DATA ALLIANCE WG: Brokering Governance Wim Hugo – ICSU-WDS/ SAEON.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
International Planetary Data Alliance Registry Project Update September 16, 2011.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
Approaches to Making Data Citeable Recommendations of the RDA Working Group Andreas Rauber, Ari Asmi, Dieter van Uytvanck Stefan Pröll.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
RDA WG on Dynamic Data Citation
Overview of WGs, IGs and BoFs
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
WG Research Data Collections RDA P10 Montréal – September 2017
Adoption of Data Citation Outcomes by BCO-DMO Cynthia Chandler, Adam Shepherd, David Bassendine Biological and Chemical Oceanography Data Management.
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Wsdl.
New input for CEOS Persistent Identifier Best Practices
Prepared by: Jennifer Saleem Arrigo, Program Manager
NSDL Data Repository (NDR)
WG Research Data Collections An overview of the recommendation
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
1st Call for Collaboration Projects
Presentation transcript:

Toward Adoption of RDA Outcomes by US Ocean Science Data Repositories Cynthia Chandler, Bob Arko, Adam Shepherd

2  BCO-DMO  Biological and Chemical Oceanography Data Management Office (WHOI)  Curation of marine ecosystem system data contributed by NSF funded investigators  R2R  Rolling Deck to Repository  Curation of routine, underway data from US academic fleet, and authoritative expedition catalog  Members of Marine Data Harmonization IG US Ocean Science Domain Repositories

3  Awareness – “I just became aware that the output exists”  Interest – “I heard about the output, have learned about it, and now have an active interest”  Evaluation – “I’ve got a strong interest and am willing to commit to evaluating the output”  Trial – “I’ve evaluated the output, decided I like it, and am willing to give it a try.”  Adoption – “It really works!! I’ve decided to adopt it and make it part of the system, Yay!” Adoption Process Stages

4  Data Citation  Data Type Registries  PID Information Types  Data Foundation & Terminology  Practical Policy  Metadata  Data Publishing (3 of 4 groups) Outcomes of Seven Groups Have Potential for Adoption by Ocean Science Data Repositories

5  BCO-DMO: transition human -> machine clients  Ocean science is interdisciplinary  Data curation is distributed  RDA outcomes can help address challenges of  Interoperability between distributed systems in ocean sciences  Interoperability between different domains (natural, social)  Need solutions that scale for 'Big Data' (VARIETY, VERACITY, velocity, volume)  RDA outcomes developed and vetted by representatives from multiple domains What does RDA offer domain repositories?

6 Data Citation (DC) of evolving data  DC goals: to create identification mechanisms that:  allow us to identify and cite arbitrary views of data, from a single record to an entire data set in a precise, machine-actionable manner  allow us to cite and retrieve that data as it existed at a certain point in time, whether the database is static or highly dynamic  DC outcomes: 13 recommendations and associated documentation  ensuring that data are stored in a versioned and timestamped manner  identifying data sets by storing and assigning persistent identifiers (PIDs) to timestamped queries that can be re-executed against the timestamped data store

7 Description of Data Citation Outputs »» Data Versioning: For retrieving earlier states of datasets the data needs to be versioned. Markers shall indicate inserts, updates and deletes of data in the database. »» Data Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp. »» Data Identification: The data used shall be identified via a PID pointing to a time- stamped query, resolving to a landing page.

8 Adoption of Data Citation Outputs  Evaluation  Evaluate recommendations  Try implementation in existing systems  Trial  BCO-DMO: R1-11 fit well with current architecture; R12 doable; test as part of DataONE node membership  R2R: curation of original field data and selected subset of post-field products (ship track); so no evolving data  Both working with DataONE as our aggregation system and service provider

9 Data Type Registry (DTR) DTR mission: see if it is possible to make implicit assumptions about data contained in datasets explicit and programmatically share these assumptions using types and type registries DTR outcomes: 1 website and 1 API. The registry website provides a user interface for someone to describe both simple and complex data types used for data within a project. They can also search data types created by others. The API provides a way to programmatically interact with the registry including the ability to import data type descriptions.

10 Description of DTR Outputs 1.Data Type RegistryData Type Registry  A website with a GUI that provides a way for an authorized someone to describe a data types used in data products. 2.Data Type Registry APIData Type Registry API  An API that among other things creates JSON representations of the information about data types. This is a pointer to the API specification implemented in the data type registry mentioned above.

11 Adoption of DTR Outputs  Evaluation  evaluate the registry and API for use in existing data repositories  try using the prototype registry to record a set of data types and then provide some example code that uses the API to access the information in the data type registry  Trial  BCO-DMO, R2R, GeoLink? data type determined by instrument type  R2R already maintains a de facto library of file types for environmental sensor systems in the US research fleet, in collaboration with NCEI and Chronopolis. We could publish this as a formal Data Type Registry

12 PID Information Type (PID-IT) PID-IT goal: Provide a way to harmonize PID information types (and associated information) that are associated PID across disciplines and PID providers. (Also to provide technical solutions) PID-IT persistent outcomes: Types for example use- cases have been registered in the type registry developed in this WG

13 Description of PID-IT Outputs 1.Type Examples and Illustration Use Cases  These are examples of … 2.API Description  A description of the API used to access the PID registry created by this group 3.API Prototype Implementation  A working version of the API connected to the PID registry that has been created 4.Registry Prototype  The registry prototype itself 5.Client demonstrator GUIClient demonstrator GUI  Demonstration of the registry and it’s use via a graphical user interface developed by the group’s intern.

14 Adoption of PID-IT Outputs  Evaluation  evaluate the client registry GUI  Trial  BCO-DMO and R2R PID systems in use: DOI (datasets and expeditions), ORCID, FundRef (US awards), ISNI (global organization), IGSN (global samples), re3data (repository), and domain-specific for instruments and measurements  Possible: R2R, BCO-DMO and NCEI (US ocean archive) joining DataONE; DataONE architecture is well-aligned with PID-IT approach, so perhaps DataONE adopts the PID-IT API and offers that as a service to the community

15 Data Foundations and Terminology (DFT) DFT mission: to understand what the core of the RDA data domain is and then develop definitions of core terms based on data models. This effort is a part of the effort to form agreement on RDA culture. DFT persistent outcomes: 4 Documents and 1 Wiki Tool that summarize the work they have done on Terminology. The wiki tool is intended to be used by other RDA WGs and IGs to extend the terminology terms, etc. beyond those determined to date.

16 Description of DFT Outputs 1.Overview Document – model descriptionsOverview Document  Report on the discussion about a large number of data models 2.Analysis & Synthesis DocumentAnalysis  Report on the analysis of the data models considered by the group 3.Term Snapshot Document  Report on a snapshot of core terms that have been identified 4.Use Cases (1), (2)(1)(2)  Use cases that describe how other working groups use the terms the group has been talked about 5.Semantic Media Wiki Term ToolSemantic Media Wiki Term Tool  Tool to capture initial list of terms and definitions for DFT WG discussions, open for others to use. (it is kind of persistent at this point) 6.Report of Interactions w others about terms  Summary of conversations with ~120 individuals about data in the context of DFT findings.

17 Adoption of DFT Outputs  Evaluation  evaluate core terms in Semantic Media Wiki Term Tool for ocean science domain as they extend our current term reference source  Trial  BCO-DMO: map local system terms to the DFT terms; adding deployment type (cruise, dive, float, experiment),  R2R: already essentially follows this model; publishes Collections that represent a field expedition (research cruise), having Persistent Identifier (DOI).  Challenges  Need dereferenceable term URIs (DTR ?)  Does the Semantic Wiki Tool provide relationships or a way to describe them? Is each term a concept in an ontology (e.g. OWL file)? Governance?

18  Interaction of DFT, PID-IT, DDRI and DTR ??  Registering terms/types at DTR (federated system of distributed, production ready, registries)  What is the appropriate ‘level’ for a registry?  Professional, domain-specific societies? (ASLO, AGU, EGU)  Institutional library?  Community organization (ESIP, OGC, ODIP) ?  Not having operational registries may hinder adoption Opportunity...

19 Practical Policy (PP) PP goals: To enable sharing, revising, adapting, and re- using of computer actionable policies for sharing data, particularly in a data repository and to suggest a set of generic policies to be applied to our data; collect and register practical policies PP persistent outcomes: Practical Policy (PP) WG recommendation package of Policy Examples and Template Workbooks

20 Description of Practical Policy Outputs 1.Policy TemplatePolicy Template  Template that includes a generic set of policies suggesting how they can be implemented within a data system. 2.ImplementationsImplementations  Policy descriptions and implementation details

21 Adoption of Practical Policy Outputs  Evaluation  evaluate the policy template and implementation documents  Trial  Identify policies that should be documented  BCO-DMO and R2R are DOI Publication Agents, but not long-term archives.  Consider documenting our practices (for archive and replication) in a computer-actionable format, so data deposits (to NCEI, Chronopolis, DataONE) can be periodically verified as part of a self-audit process

22 Metadata  Metadata group goals:  Set up a sustainable, community-driven RDA Metadata Standards Directory, designed for users rather than automated tools, that provides brief details for common research data.  Compile a set of use cases that analyze and document the various ways in which metadata can be used (e.g. for discovery, exchange, re- use, etc.)  Metadata group outcomes:  UK DCC Disciplinary Metadata Standards Catalogue  functional GitHub prototype directory with version control

23 Adoption of Metadata Group Outputs  Evaluation  Compare DCC lists with current practices  Identify standards where we currently have none  Identify mismatches and consider addressing them  Current Status  BCO-DMO: ISO (19139 compliant), DIF, CF via NVS, O&M, PROV, RDF, DCAT, Dublin Core, (OAI-ORE soon), (used to do FGDC, but dropped that recently in favor of ISO-19115)  R2R: all of the above plus DataCite and IGSN for samples

24 Publishing Data Working Groups  Publishing Data Workflows  Publishing Data Services  Publishing Data Bibliometrics  Evaluate and Monitor:  R2R and BCO-DMO will evaluate these outcomes and look for ways to implement in our current architecture, or relevant communities (promoting recommended practices)  GeoLink (NSF EarthCube) using Linked Data (Semantic Web technologies) to connect data and publications  meaningful data use statistics would be very welcome

25 Infrastructure Components  supporting the modern research endeavor, is like creating a quilt; a work of art created by a community of practitioners with a shared goal. Each member of the community lovingly, and laboriously designs and constructs their piece of the quilt.

26 Creating the Infrastructure for a Domain  Putting the pieces together to create the ‘whole’ block for a domain.

27 RDA  Combining the domains,  And adding the unifying framework,  to create the global research data quilt (RDA)

Thank you!

29  Data Description Registry Interoperability (DDRI)  Infrastructure providers & data librarians to find connections across research data registries and create global views of research data.  Repository Audit and Certification DSA–WDS  A convergent DSA-WDS certification standard will help eliminate duplication of effort, increase certification procedure, coherence and compatibility thus benefitting researchers, data managers, librarians and scientific communities.  RDA/WDS Publishing Data Bibliometrics  RDA/WDS Publishing Data Services  universal interlinking service between data and scientific literature  RDA/WDS Publishing Data Workflows  Wheat Interoperability Others (6 other group outcomes available by Sep 2015)

30 SLIDE TEMPLATE: 3 slides per working group

31 Group name (GRP)  GRP goal:  Description  GRP outcomes:  Description

32 Description of GRP Outputs 1.output  description 2.output  description

33 Adoption of GRP Outputs  Suggestions  evaluate  try using the product  Test cases  BCO-DMO  R2R  GeoLink