Presentation is loading. Please wait.

Presentation is loading. Please wait.

GEOSS Data Management Principles Implementation: Core Trustworthy Data Repository certification Michael Diepenbroek PANGAEA Wim Hugo NRF/SAEON Mustapha.

Similar presentations


Presentation on theme: "GEOSS Data Management Principles Implementation: Core Trustworthy Data Repository certification Michael Diepenbroek PANGAEA Wim Hugo NRF/SAEON Mustapha."— Presentation transcript:

1 GEOSS Data Management Principles Implementation: Core Trustworthy Data Repository certification Michael Diepenbroek PANGAEA Wim Hugo NRF/SAEON Mustapha Mokrane World Data System (WDS) GEO Data Providers Workshop Florence, Italy 21 April 2017

2 https://goo.gl/ipEBmW

3 Core Trustworthy Data Repository Requirements
Catalogue of (16) requirements: Context Organizational infrastructure (6) Digital object management (8) Technology (2) Applicant feedback

4 Core Certification procedure
Self-assessment in online tool with guidance: Documented and public evidence (URLs) Compliance levels / Maturity ratings Peer-review (2 reviewers) Certification as Core Trustworthy Data Repository (3 years) Self-assessments publicly available Renewal of certification (every 3 years)

5 CORE TDR Requirements Context Repository type, designated community, level of curation performed, outsource partners… Organizational infrastructure R1. DR has an explicit mission to provide access to and preserve data in its domain. R2. DR maintains all applicable licenses covering data access & use & monitors compliance R3. DR has a continuity plan to ensure ongoing access to and preservation of its holdings. R4. DR ensures that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. R5. DR has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance R6. repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in-house, or external, scientific guidance).

6 CORE TDR Requirements Digital object management
R7. DR guarantees the integrity and authenticity of the data. R8. DR accepts data & metadata based on defined criteria to ensure relevance and understandability for users. R9. DR applies documented processes and procedures in managing archival storage of the data. R10. DR assumes responsibility for long-term preservation and manages this function in a planned and documented way. R11. DR has appropriate expertise to address technical data and metadata quality sufficient to make quality evaluations. R12. Archiving takes place according to defined workflows from ingest to dissemination. R13. DR enables users to discover the data and refer to them in a persistent way through proper citation. R14. DR enables reuse of the data over time, ensuring that appropriate metadata support the understanding and use of the data.

7 R3. Continuity Plan Repository has a continuity plan to ensure ongoing access to and preservation of its holdings: The level of responsibility undertaken for data holdings, including any guaranteed preservation periods. The medium-term (three- to five-year) and long-term (> five years) plans in place to ensure the continued availability and accessibility of the data. In particular, both the response to rapid changes of circumstance and long-term planning should be described, indicating options for relocation or transition of the activity to another body or return of the data holdings to their owners (i.e., data producers). For example, what will happen in the case of cessation of funding, which could be through an unexpected withdrawal of funding, a planned ending of funding for a time-limited project repository, or a shift of host institution interests? Guidance: This requirement covers the measures in place to ensure access to, and availability of, data holdings, both currently and in the future. Reviewers are seeking evidence that preparations are in place to address the risks inherent in changing circumstances. Reviewers are seeking evidence that preparations are in place to address the risks inherent in changing circumstances.

8 Replies received XXXX has no explicit limitation of the period of holding of data and samples. However XXXX has established a data management office in 1998 and continued its operation since then. The repository depends on funding from the government, so the abrupt change of funding policy directly influences the continuity of the repository. To avoid this situation, government wants XXXX to have its own funding scheme, mainly from business-oriented applications. However, this is still in a preliminary stage because it has started just one year ago. Due to this funding system, we do not claim the continuity of data for the long term, and try best for the continuity of data as long as funding is stable. Continuity of access is provided both at the hardware and software levels. There is a powerful monitoring system at all stages of processing: the receipt, ingestion of data and obtaining derivative information products. For a number of Information Resources points the admissible delay time in the arrival of data. All means are aimed at ensuring the availability of data in the 24/7 mode.

9 Example of good evidence
Continuity of scientific data stewardship services that are provided by SEDAC is facilitated by the establishment of a Long-Term Archive (LTA) and contingency plans, in collaboration with the Columbia University Libraries, to ensure that scientific data stewardship operations continue in the future. For long-term scientific data stewardship, appraisal and selection are conducted to identify candidate data products for the LTA, which will provide continuing services for each selected data product at the designated level of service. The LTA Board appraises and selects data products for the LTA and designates levels of services for each LTA data product. The LTA Board consists of representatives from the Earth Institute of Columbia University, the Columbia University Libraries, and CIESIN. In the event of a lapse in funding, primary responsibilities for the management of the LTA and the stewardship of its resources shift to the Columbia University Libraries and to the Earth Institute of Columbia University. USGS EROS Centre like all U.S. federal agencies is required to have a Continuity Of Operation Plan (COO or COOP). Additionally, as part of EROS' risk mitigation strategy, all of our electronic, long-term science data are sent to the U.S. National Archives and Records Administration (NARA) Lees Summit, Missouri, which is five hours from our facility. We currently have over 4 PB of data stored on magnetic media at Lees Summit. Our metadata is stored offsite, separately, at a different facility.

10 R12. Workflows Archiving takes place according to defined workflows from ingest to dissemination. Workflows/business process descriptions. Clear communication to depositors and users about handling of data. Levels of security and impact on workflows (guarding privacy of subjects, etc.) Qualitative and quantitative checking of outputs. Appraisal and selection of data. Approaches towards data that do not fall within the mission/collection profile. The types of data managed and any impact on workflow. Decision handling within the workflows (e.g., archival data transformation). Change management of workflows. Guidance: To ensure the consistency of practices across datasets and services and to avoid ad hoc and reactive activities, archival workflows should be documented, and provisions for managed change should be in place. The procedure should be adapted to the repository mission and activities, and procedural documentation for archiving data should be clear.

11 Replies received Data work flows in XXXX data management office are documented in manuals in Japanese. XXXX has a workflow that describes from contact to publication. A potential data producer first makes a contact with XXXX from a few contact points, and internal committees have different roles in making decisions about the acceptance of data. When accepted, few staffs with different roles make contact with data producers to add metadata, ingest data, and assign DOI, and finally data is published on the Web. This workflow is now in an experimental phase, and will be updated after accepting more data through the workflow. The workflow is controlled by the use of control tables. The Life Cycle (LC) Table contains a formalized description of the algorithm for processing the data of each individual Information Resource in the repository. The LC can consist of a number of processing functions, for example, reduction to other units of measurement, calculation of derivative characteristics on the basis of available data, accumulation of data for a specified period of time, etc. The complete set of functions is called the processing functions framework.

12 Example of good evidence
Workflows within the archive are organised according to an archival life cycle, ranging from acquisition/ pre-ingest to dissemination of data. The central functions of the OAIS reference model can be mapped to the existing structure of the archive. For most parts of the corresponding workflows, procedures, standards and rules are in place. Even though internal documentation exists it currently is not complete and up to date for every activity. Significant steps to improve this situation were already taken, but there is still some work to be done. An overview about the different steps and processes (pre-ingest, ingest, study description, archiving, access) can be found on our website: All transformations made to data are documented. All significant corrections/changes of the data will be discussed with data depositors beforehand. General information about handling of the data is given on our website and as well during pre-ingest communication with depositors.

13 Example of good evidence
Standard procedures for changes applied to the data during ingest or at later stages are available (e.g. naming conventions, handling of missing data, versioning rules). Existing internal documentation is available to all staff members through a wiki. A concept for secure data management was developed and will be discussed by the department and implemented in Data may contain confidential information that may not be accessed either by the public or by staff that is not authorised. If not clarified before delivering data to the archive, employees who are trained in data protection issues have to decide at a very early stage, if a study contains confidential data that requires special protection. All further steps depend on this decision. To accommodate sensitive data, the existing workflows were expanded with additional measures. For this purpose we tested successfully the use of encrypted file containers On the one hand it should be as secure as possible and on the other hand it should also be as simple in handling as possible. In cases where archiving with GESIS cannot be realized, we try to find alternative data centres (e.g. partners in the network of research data centres of the German Data Forum) and bring data producers into contact with them. If there is no alternative option, GESIS tries to act as (temporary) fallback option

14 R14. Reusability The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data. Which metadata are required by the repository when the data are provided (e.g., Dublin Core or content-oriented metadata)? Are data provided in formats used by the Designated Community? Which formats? Are measures taken to account for the possible evolution of formats? Are plans related to future migrations in place? How does the repository ensure understandability of the data? Guidance: Repositories must ensure that data can be understood and used effectively into the future despite changes in technology. This Requirement evaluates the measures taken to ensure that data are reusable. For this Requirement, responses should include evidence related to the following questions:

15 Replies received Data from XXXX are publicized with relevant metadata and reuse of data for scientific purposes are not restricted. Data format are different by data type. Examples of metadata schema are the Directory Interchange Format (DIF) for catalogue system or Darwin Core for biodiversity information. XXXX metadata is almost compatible with ISO The vocabulary comes from GCMD and other community standards. We have data curators to improve the understandability of metadata, but as the number of data increases, we are required to improve the level of understandability to enhance findability of data, so we are now reviewing the current metadata to check if they are good enough. The requested digital data can be provided in either CSV or NetCDF format. Spatial data is available as WMS-services. Object files can be downloaded from the repository by reference. Currently, attempts to transform the data to the latest versions of these formats are made.

16 Example of good evidence
VLIZ works with different formats, dependent on the data theme: Users submitting data in one of the repositories will have to describe the metadata using the template which is ISO compliant. Checks will be performed to check the accuracy and quality. These checks are based on the parameters, units, geographical information and the format used to describe data and time. More about the data submit guidelines can be found in the following document: Data is stored in the original formats that are provided by the scientists. For data that will be integrated in other specific systems, a reformatting and standardization step is performed. The data will be standardized into one of the following formats: • Darwin core Archive - • OBIS Scheme - • ODV Format - Format used to load data into Ocean Data Viewer software. The integration of data into the different databases is based on the mentioned schemes. Since the different formats evolve over time, the output formats from the relational databases are adapted accordingly.

17 Start the process!

18 Thank you!


Download ppt "GEOSS Data Management Principles Implementation: Core Trustworthy Data Repository certification Michael Diepenbroek PANGAEA Wim Hugo NRF/SAEON Mustapha."

Similar presentations


Ads by Google