New input for CEOS Persistent Identifier Best Practices

Slides:



Advertisements
Similar presentations
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Advertisements

PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Presented by DOI Create: TERN as a use-case Siddeswara Guru
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
VAMDC use-case for the RDA Data Citation Working Group C.M. Zwölf and VAMDC consortium 6 th RDA Plenary PARIS September 2015.
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
Data Citation Working Group P6 23 nd Sep 2015, Paris.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
What have we learned?. What is a database? An organized collection of related data.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
“Dynamic” Data at BCO-DMO Biological and Chemical Oceanography Data Management Office (BCO-DMO) Shannon Rauch -- Danie Kinkade --
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Data Citation Implementation Pilot Workshop
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
Dr. Ari Asmi Research Coordinator Faculty of Science Department of Physics MAKING DYNAMIC DATA CITABLE: APPROACHES TO DATA CITATION WITHIN AS A RDA WORKING.
Updating image To update the background image: Go to ‘View’ Select ‘Slide Master’ Select the page with the image Right click on the image and select ‘Change.
Approaches to Making Data Citeable Recommendations of the RDA Working Group Andreas Rauber, Ari Asmi, Dieter van Uytvanck Stefan Pröll.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Architecture Review 10/11/2004
RDA WG on Dynamic Data Citation
Database Development Lifecycle
First Light for DOIs at ESO
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
Implementing the Data Management Principles Opportunities and Advantages Robert R. Downs, PhD Sr. Digital Archivist, CIESIN, Columbia University.
Adoption Update: Opening up Northern Forest Research Data
Building A Repository for Digital Objects
-
Data Ingestion in ENES and collaboration with RDA
Persistent Identifiers Implementation in EOSDIS
DOI Overview to Support its Use in GSICS
ACS 2016 Moving research forward with persistent identifiers
Software Configuration Management
Active Data Management in Space 20m DG
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
General Finnish DMP Guidance
INPE, São José dos Campos (SP), Brazil
DATA SPHINX & EUDAT Collaboration
Attributes and Values Describing Entities.
Metadata for research outputs management
OpenML Workshop Eindhoven TU/e,
Sub-regional workshop on integration of administrative data, big data
WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.
From Observational Data to Information (OD2I IG )
WG Research Data Collections Draft outputs of a RDA bottom-up effort P9 - April 2017 Co-chairs: Bridget Almas, Frederik Baumgardt, Tobias Weigel, Thomas.
Archives and Records Professionals for Research Data IG
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
LO4 - Be Able to Update Websites to Meet Business Needs
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
DOI Overview and its Usage for EUMETSAT GSICS Objects
Agenda (AM) 9:30-10:15 Introduction to RDA
Interoperability – GO FAIR - RDA
Brian Matthews STFC EOSCpilot Brian Matthews STFC
The Research Data Alliance
Organisations & The Research Data Alliance (RDA) - the Organisational Advisory Board CC BY-SA 4.0.
Final Design Authorization
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
The Research Data Alliance
Case from RDA - Solutions for Data Management Jungle
Co-Chairs: Keith Jeffery, Rebecca Koskela, Alex Ball
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Lunar Calibration Workshop Status of the preparation
Research Data Dr Aoife Coffey, Research Data Coordinator
Persistent identifier for EO data CNES
Presentation transcript:

New input for CEOS Persistent Identifier Best Practices R. MORENO

Context : Inter-Pole & DOI RDA Differences between LTDP and RDA RDA recommandations for DOI CNES & DOI

Context : Inter-Pole & DOI 4 data poles Aeris http://www.aeris-data.fr/ « atmosphere and service data Pole » facilitate and enhance the use of atmospheric data whether from satellites, ground based , airplanes or balloons. generates products from observations, but also many support services for the use of data , help to conduct campaigns, or interface with models. Ocean http://www.pole-ocean.fr/en/The-Pole-Ocean Form@ter http://poleterresolide.fr/?l=en Solid Earth observation Theia https://www.theia-land.fr/en/presentation/theia National inter-agency organization designed to foster the use of images coming from the space observation of land surfaces

Context : Inter-Pole & DOI Objective Share common practices / standards Data preservation DOI Single Sign On Interoperability Thesaurus … 20 January 2016 : technical workshop One theme : DOI Several presentations, including INIST : French regional PID agent RDA : analysed the CEOS recommendations

RDA RDA : Research Data Alliance https://rd-alliance.org/ The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data. The RDA vision is researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society. Produced recommandations on DOI https://rd-alliance.org/system/files/documents/RDA-DC-Recommendations_151020.pdf

Differences between LTDP and RDA Similarities Numbering should be completely opaque. New version => new PID Differences Citation at collection / data set level => Perte de la citation fine Assign a single PID for a whole time series, even if new data are still being added => Lost of reproducibility . Use one PID for a multi-satellite time-series, as long as the series is internally consistent => potentially very long citations

RDA recommandations for DOI Preparing the Data and the Query Store Prepare existing data sources and provide the required infrastructure, which is needed for implementing the query based approach. R1 – Data Versioning Apply versioning to ensure earlier states of data sets can be retrieved. R2 – Timestamping Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp. R3 – Query Store Facilities Provide means for storing queries and the associated metadata in order to re-execute them in the future.

RDA recommandations for DOI Persistently Identify Specific Data Sets When a data set should be persisted, the following steps need to be applied: R4 – Query Uniqueness Re-write the query to a normalised form so that identical queries can be detected. Compute a checksum of the normalized query to efficiently detect identical queries. R5 – Stable Sorting Ensure that the sorting of the records in the data set is unambiguous and reproducible R6 – Result Set Verification Compute fixity information (checksum) of the query result set to enable verification of the correctness of a result upon re-execution. R7 – Query Timestamping Assign a timestamp to the query based on the last update to the entire database (or the last update to the selection of data affected by the query or the query execution time). This allows retrieving the data as it existed at the time a user issued a query.

RDA recommandations for DOI Persistently Identify Specific Data Sets When a data set should be persisted, the following steps need to be applied: R8 – Query PID Assign a new PID to the query if either the query is new or if the result set returned from an earlier identical query is different due to changes in the data. Otherwise, return the existing PID. R9 – Store Query Store query and metadata (e.g. PID, original and normalized query, query & result set checksum, timestamp, superset PID, data set description, and other) in the query store. R10 – Automated Citation Texts: Generate citation texts in the format prevalent in the designated community for lowering the barrier for citing the data. Include the PID into the citation text snippet.

RDA recommandations for DOI Resolving PIDs and Retrieving the Data R11 – Landing Page Make the PIDs resolve to a human readable landing page that provides the data (via query re-execution) and metadata, including a link to the superset (PID of the data source) and citation text snippet. R12 – Machine Actionability Provide an API / machine actionable landing page to access metadata and data via query re-execution.

RDA recommandations for DOI Upon Modifications to the Data Infrastructure R13 – Technology Migration When data is migrated to a new representation (e.g. new database system, a new schema or a completely different technology), migrate also the queries and associated fixity information. R14 – Migration Verification Verify successful data and query migration, ensuring that queries can be re-executed correctly

Result of LTDP WG Analysis The Research Data Alliance (RDA) is composed of different WGs and in particular the aforesaid RDA recommendations were delivered by Scalable Dynamic Data Citation Working Group under the Data Citation WG. The RDA recommendation gives a set of 14 clear rules that make your dynamic data citable. The approach is aimed at dynamic data, where any part of the data set could change at any time therefore if you want to give a PID to a data set like that, you have to use some kind of timestamp to make sure you are getting the right version. The Earth Observation time series data sets can be dynamic, but in a very limited way. If anything within the existing time frame changes, it would need a new version and a new PID. The CEOS best practices are really for larger-scale data archives.   COPIL PEPS – 4 décembre 2015 27/11/2018

No changes in our PID Best Practices WAY Forward No changes in our PID Best Practices Discuss with the Data Citation WG members in order to well explain our context and the meaning of our recommendations. After a conversation with Data Citation WG some minor updates could be evaluated: Re-wording of some recommendations to make our document more clear. New Use case: if truly dynamic data sets need to be managed, where any part of the data set could potentially change at any time, consider the RDA time-stamped query store approach;   COPIL PEPS – 4 décembre 2015 27/11/2018