Working Group on Information Systems and Services (WGISS)

Slides:



Advertisements
Similar presentations
Configuration Management
Advertisements

ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group Session LTDP Activities & DSIG Introduction WGISS-37 Meeting Cocoa Beach (Florida-US)
Cybersecurity: Engineering a Secure Information Technology Organization, 1st Edition Chapter 7 Software Supporting Processes and Software Reuse.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Vijay V Vijayakumar.  SOX Act  Difference between IT Management and IT Governance  Internal Controls  Frameworks for Implementing SOX  COSO - Committee.
WGClimate John Bates NOAA SIT Workshop Agenda Item #8 WGClimate Work Plan progress & Issues CEOS SIT Technical Workshop CNES, Montpellier, France 17 th.
Georgia Institute of Technology CS 4320 Fall 2003.
GEO Work Plan Symposium 2014 Data Management Task Force.
WGISS Working Group on Information Systems and Services Richard MORENO CNES WGISS report, Agenda Item 14 Tromsø, Norway October 2014.
Draft GEO Framework, Chapter 6 “Architecture” Architecture Subgroup / Group on Earth Observations Presented by Ivan DeLoatch (US) Subgroup Co-Chair Earth.
Verification and Validation Assuring that a software system meets a user's needs.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
Data Management Principles Implementation Guidelines Document 10 David Halpern Co-chair of DMP-TF November 11th 2015 Mexico City.
Purpose: The purpose of CMM Integration is to provide guidance for improving your organization’s processes and your ability to manage the development,
Report of the Architecture and Data Committee (ADC) R.Shibasaki (ADC, Japan)
Improving Information Quality for Earth Science Data and Products – An Overview H. K. (Rama) Ramapriyan Science Systems and Applications, Inc. & NASA Goddard.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
Project Management Strategies Hidden in the CMMI Rick Hefner, Northrop Grumman CMMI Technology Conference & User Group November.
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group ESA – EO Data Stewardship Maturity Matrix WGISS#41 Meeting, Canberra, (AUS) 14–18 March,
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
QA4EO in 10 Minutes! A presentation to the 10 th GHRSST Science Team Meeting.
GEO Data Management Principles Implementation : World Data System–Data Seal of Approval (WDS-DSA) Core Certification of Digital Repositories Dr Mustapha.
QA4EO Update on the Quality Assurance Framework For Earth Observation Joint GSICS GDWG-GRWG meeting.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
AGU’s Data Management Maturity (DMM) Workshop ESIP Summer Meeting, Durham July 19, 2016 Shelley Stall AGU Assistant Director, Enterprise Data Management.
© 2016 Chapter 6 Data Management Health Information Management Technology: An Applied Approach.
Advanced Software Engineering Dr. Cheng
JMFIP Financial Management Conference
Data Stewardship Interest Group WGISS-43 Meeting
NASA Earth Science Data Stewardship
CESSDA SaW Training on Trust, Identifying Demand & Networking
Configuration Management
Data Stewardship Interest Group WGISS-42 Meeting
Software Quality Control and Quality Assurance: Introduction
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
CEOS Work Plan Andrew Mitchell - NASA WGISS-43
Paul Eglitis [IEEE] and Siri Jodha S. Khalsa [IEEE]
Data Stewardship Interest Group WGISS-43 Meeting
GEOSS Interoperability Workshop
Ensuring and Improving Information Quality for Earth Science Data and Products – Role of the ESIP Information Quality Cluster H. K. (Rama) Ramapriyan,
WGISS Working Group for Information Systems & Services
Trustworthiness of Preservation Systems
Chapter 10 Software Quality Assurance& Test Plan Software Testing
Configuration Management
Software Engineering (CSI 321)
AGU Paper Number: IN43B-1697 Evolving a NASA Digital Object Identifiers System with Community Engagement Lalit Wanchoo1 and Nathan.
Active Data Management in Space 20m DG
Data Stewardship Interest Group WGISS-44 Meeting
Information Technology Project Management – Fifth Edition
2017 WPS Break-Out Session: Addressing Gaps
Agency Requirements: NOAA Administrative Order Management of environmental and geospatial data and information This training module is part of.
ESMF Governance Cecelia DeLuca NOAA CIRES / NESII April 7, 2017
CMMI Overview.
Digital Preservation measuring our capability and ‘confronting the abyss’ Sarah Slade on behalf of the NSLA Digital Preservation Group.
GEOSS EVOLVE Osamu Ochiai GEO Secretariat CEOS WGISS-43
CMMI – Staged Representation
WGISS-WGCV Joint Session
Engineering Processes
Data Stewardship Interest Group WGISS-45 Meeting
Data Stewardship Interest Group WGISS-45 Meeting
WGISS-WGCV Joint Activity Standardisation and Best Practices
Verification and Validation Unit Testing
International Financing Institutions (IFI) Engagement
Capability Maturity Model
Chapter # 8 Quality Management Standards
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Capability Maturity Model
Research data lifecycle²
Working Group on Information Systems and Services (WGISS)
Presentation transcript:

Working Group on Information Systems and Services (WGISS) Committee on Earth Observation Satellites Working Group on Information Systems and Services (WGISS) Iolanda Maggio, ESA, WGISS#47 GEOSS DMP IG Proposal - Discussion NOAA – Silver Spring, USA 1st of May 2019

BACKGROUND: Starting Points Scientific Data Stewardship Maturity Matrix: A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets. It indicates all possible activities to be performed to preserve and improve the information content, quality, accessibility, and usability of data and metadata. Gives a technical evaluation of the level of preservation and helps with self assessment of preservation. Data Management Principles: These principles address the need for discovery, accessibility, usability, preservation, and curation of the resources made available through GEOSS. The DMP provide guidelines that data providers and others can use as they seek to implement the principles. The DMP provide a basis for assessing how well the principles are being adhered to in practice. Capability Maturity Model Integration (CMMI) is a process improvement training and appraisal program and service administered and marketed by Carnegie Mellon University (CMU) and required by many DoD and U.S. Government contracts, especially in software development. CMU claims CMMI can be used to guide process improvement across a project, division, or an entire organization. CMMI defines the following maturity levels for processes: Initial, Managed and Defined. CMMI currently addresses three areas of interest: Product and service development — CMMI for Development (CMMI-DEV), Service establishment, management, — CMMI for Services (CMMI-SVC), and Product and service acquisition — CMMI for Acquisition (CMMI-ACQ). Climate Data Record Maturity Matrix (CDRMM): Bates and Privette, 2012 TECHNOLOGY READINESS LEVELS (TRLs): Standard which measures the maturity of new technologies Scientific Readiness Levels (SRL): FOR SPACE APPLICATIONS

The GEOSS Data Management Principles Implementation Guidelines Start Date: April 2014 GEOSS Data Management Principles Document Preliminary version reviewed by GEO-XI in November 2014 Incorporated in GEO Strategic Plan 2016-2025 Short Version on Page 12 and Long Version in Reference Document GEOSS Data Management Principles Implementation Guidelines Preliminary Version is GEO-XII Document 10 (40 pages) DMPTF Members and Partners

Data Management Principles The value of Earth observations are maximized through data life-cycle management based on ten principles supporting five themes. DISCOVERABILITY DMP-1: Data and metadata will be discoverable ACCESSIBILITY DMP-2: Data will be accessible via online services USABILITY DMP-3: Encoding DMP-4: Documentation DMP-5: Traceability DMP-6: Quality PRESERVATION DMP-7: Preservation DMP-8: Verification CURATION DMP 9: Review and reprocessing DMP 10: Persistent and resolvable identifiers

Maturity Matrix for Long-Term Scientific Data Stewardship – 2015, Ge Peng and Jeffrey L. Privette & Others Preservability: it focuses on assessing practices associated with data storage for resilience requirements, i.e., backup or a duplicate copy in a physically separate facility for disaster recovery, and on compliance to community-accepted archive practices and metadata standards. Preservability metadata are necessary for file storage and retrieval purposes only. They include a unique identifier for the dataset, file naming convention, file size, data volume, and, if available, a unique identifier for the collection- level metadata record associated with the dataset. The file naming convention is treated in this paper as being more relevant to preservability; however, the implication of defined file naming conventions on data usability and interoperability should also be considered when determining file naming conventions. Accessibility: The maturity of the accessibility key component will focus on whether users can easily find and access data online. It measures whether a dataset is searchable and discoverable for collection only or to the granule level. Usability: it deals with how easily users are able to use the data and learn whether the data are suitable for their own data requirements. It is closely tied to online documentation availability (e.g., Quick starter guide, Users’ guide, or Readme file), file format for interoperability, online data customization and visualization capability, and data characterization. It strives to alleviate the users’ burden of learning about and understanding the data. Production Sustainability: it is addressed in terms of various degrees of commitment for and associated requirements on the product. Here, it is assumed that the commitment is backed by the necessary financial support. At maturity level 5, changes for technology in data production should be routinely incorporated into planning for continued, sustained stewardship. Data Quality Assurance (DQA): is a set of activities or procedures focused on defect prevention to be followed in order to ensure product quality during development. Data quality screening (DQS) is a set of activities intended to ensure the source data are clean. DQS is a commonly used procedure for identifying missing or redundant records. Data Quality Control/ Monitoring: is a set of activities taken to evaluate the product to ensure that it conforms to the required specifications. It is product-oriented and focuses on data anomaly detection. It is usually carried out after the product is created or at each major milestone of the product development and processing cycle. Data Quality Assessment: is a set of activities designed to ensure that the products are scientifically sound (i.e., building the right thing), by carefully evaluating the product, usually by comparison with similar well-established and validated observations or data product(s). Transparency /Traceability: The focus here is on the level of availability of information about the product and how it was created, the level of practices associated with management of documents, source code, and system information, and whether data and publication citations were tracked, such as by utilizing a Digital Object Identifier (DOI) system. Data Integrity: Data integrity refers to the validity of data, i.e., the accuracy and consistency of data over its entire lifecycle. The Data integrity component in this version of the stewardship maturity matrix primarily assesses the practices applied to datasets to ensure the data files are free of intentional or unintentional corruption during data transfer, ingest, storage, and dissemination and to ensure data authenticity at access.

1^ EXERCISE: DMP IG content completeness measuring through the Data Stewardship Maturity Matrix

DMP Maturity Matrix Assessment

DMP Maturity Matrix Assessment

DMP Maturity Matrix Assessment

Assessment results and Rating

DMP Measurement Results Preservability: it focuses on assessing practices associated with data storage for resilience requirements, i.e., backup or a duplicate copy in a physically separate facility for disaster recovery, and on compliance to community-accepted archive practices and metadata standards. Preservability metadata are necessary for file storage and retrieval purposes only. They include a unique identifier for the dataset, file naming convention, file size, data volume, and, if available, a unique identifier for the collection- level metadata record associated with the dataset. The file naming convention is treated in this paper as being more relevant to preservability; however, the implication of defined file naming conventions on data usability and interoperability should also be considered when determining file naming conventions. Accessibility: The maturity of the accessibility key component will focus on whether users can easily find and access data online. It measures whether a dataset is searchable and discoverable for collection only or to the granule level. Usability: it deals with how easily users are able to use the data and learn whether the data are suitable for their own data requirements. It is closely tied to online documentation availability (e.g., Quick starter guide, Users’ guide, or Readme file), file format for interoperability, online data customization and visualization capability, and data characterization. It strives to alleviate the users’ burden of learning about and understanding the data. Production Sustainability: it is addressed in terms of various degrees of commitment for and associated requirements on the product. Here, it is assumed that the commitment is backed by the necessary financial support. At maturity level 5, changes for technology in data production should be routinely incorporated into planning for continued, sustained stewardship. Data Quality Assurance (DQA): is a set of activities or procedures focused on defect prevention to be followed in order to ensure product quality during development. Data quality screening (DQS) is a set of activities intended to ensure the source data are clean. DQS is a commonly used procedure for identifying missing or redundant records. Data Quality Control/ Monitoring: is a set of activities taken to evaluate the product to ensure that it conforms to the required specifications. It is product-oriented and focuses on data anomaly detection. It is usually carried out after the product is created or at each major milestone of the product development and processing cycle. Data Quality Assessment: is a set of activities designed to ensure that the products are scientifically sound (i.e., building the right thing), by carefully evaluating the product, usually by comparison with similar well-established and validated observations or data product(s). Transparency /Traceability: The focus here is on the level of availability of information about the product and how it was created, the level of practices associated with management of documents, source code, and system information, and whether data and publication citations were tracked, such as by utilizing a Digital Object Identifier (DOI) system. Data Integrity: Data integrity refers to the validity of data, i.e., the accuracy and consistency of data over its entire lifecycle. The Data integrity component in this version of the stewardship maturity matrix primarily assesses the practices applied to datasets to ensure the data files are free of intentional or unintentional corruption during data transfer, ingest, storage, and dissemination and to ensure data authenticity at access. Level 5 is Level 4 + other things

2^ EXERCISE: Create a WIGISS Maturity Matrix with DMP IG simplifying the Data Stewardship Maturity Matrix

Approach NEED: Create a Maturity Matrix with DMP IG simplifying the WGISS EO Data Stewardship Maturity Matrix Implemented Solutions: Substituting the Key components of the Maturity Matrix with the DMP IG Normalizing the number of maturity levels reducing them to new 4 levels Giving a weight to every components of the principles in order to give an incremental order for the implementation of the preservation and curation activities

DMP Guidelines and key Components mapping DMP Implementation Guidelines MM Key Components DMP-1: Discoverability Accessibility DMP-2: Accessibility DMP-3: Encoding Usability DMP-4: Documentation DMP-5: Traceability Transparency /Traceability DMP-6: Quality Data Quality Control/ Monitoring Data Quality Assessment Data Quality Assurance DMP-7: Preservation Preservability DMP-8: Verification Data Integrity DMP-9: Review and reprocessing Production Sustainability DMP-10: Persistent and resolvable identifiers invertire

WGISS EO Data Stewardship Maturity Matrix Preservability: it focuses on assessing practices associated with data storage for resilience requirements, i.e., backup or a duplicate copy in a physically separate facility for disaster recovery, and on compliance to community-accepted archive practices and metadata standards. Preservability metadata are necessary for file storage and retrieval purposes only. They include a unique identifier for the dataset, file naming convention, file size, data volume, and, if available, a unique identifier for the collection- level metadata record associated with the dataset. The file naming convention is treated in this paper as being more relevant to preservability; however, the implication of defined file naming conventions on data usability and interoperability should also be considered when determining file naming conventions. Accessibility: The maturity of the accessibility key component will focus on whether users can easily find and access data online. It measures whether a dataset is searchable and discoverable for collection only or to the granule level. Usability: it deals with how easily users are able to use the data and learn whether the data are suitable for their own data requirements. It is closely tied to online documentation availability (e.g., Quick starter guide, Users’ guide, or Readme file), file format for interoperability, online data customization and visualization capability, and data characterization. It strives to alleviate the users’ burden of learning about and understanding the data. Production Sustainability: it is addressed in terms of various degrees of commitment for and associated requirements on the product. Here, it is assumed that the commitment is backed by the necessary financial support. At maturity level 5, changes for technology in data production should be routinely incorporated into planning for continued, sustained stewardship. Data Quality Assurance (DQA): is a set of activities or procedures focused on defect prevention to be followed in order to ensure product quality during development. Data quality screening (DQS) is a set of activities intended to ensure the source data are clean. DQS is a commonly used procedure for identifying missing or redundant records. Data Quality Control/ Monitoring: is a set of activities taken to evaluate the product to ensure that it conforms to the required specifications. It is product-oriented and focuses on data anomaly detection. It is usually carried out after the product is created or at each major milestone of the product development and processing cycle. Data Quality Assessment: is a set of activities designed to ensure that the products are scientifically sound (i.e., building the right thing), by carefully evaluating the product, usually by comparison with similar well-established and validated observations or data product(s). Transparency /Traceability: The focus here is on the level of availability of information about the product and how it was created, the level of practices associated with management of documents, source code, and system information, and whether data and publication citations were tracked, such as by utilizing a Digital Object Identifier (DOI) system. Data Integrity: Data integrity refers to the validity of data, i.e., the accuracy and consistency of data over its entire lifecycle. The Data integrity component in this version of the stewardship maturity matrix primarily assesses the practices applied to datasets to ensure the data files are free of intentional or unintentional corruption during data transfer, ingest, storage, and dissemination and to ensure data authenticity at access. Level 5 is Level 4 + other things ESA UNCLASSIFIED – For Official Use

WGISS Data Management and Stewardship Maturity Matrix - White Paper Data stewardship “encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata” (National Research Council 2007).   Data management includes all activities for “planning, execution and oversight of policies, practices and projects that acquire, control, protect, deliver and enhance the value of data and information assets.” (Mosely et al. 2009). In addition to data preservation, accessibility, and usability, WGISS MM also covers discoverability, use and services capabilities, program budget, data policy, etc.

WGISS Contribution to DMP Evolution Preservability: it focuses on assessing practices associated with data storage for resilience requirements, i.e., backup or a duplicate copy in a physically separate facility for disaster recovery, and on compliance to community-accepted archive practices and metadata standards. Preservability metadata are necessary for file storage and retrieval purposes only. They include a unique identifier for the dataset, file naming convention, file size, data volume, and, if available, a unique identifier for the collection- level metadata record associated with the dataset. The file naming convention is treated in this paper as being more relevant to preservability; however, the implication of defined file naming conventions on data usability and interoperability should also be considered when determining file naming conventions. Accessibility: The maturity of the accessibility key component will focus on whether users can easily find and access data online. It measures whether a dataset is searchable and discoverable for collection only or to the granule level. Usability: it deals with how easily users are able to use the data and learn whether the data are suitable for their own data requirements. It is closely tied to online documentation availability (e.g., Quick starter guide, Users’ guide, or Readme file), file format for interoperability, online data customization and visualization capability, and data characterization. It strives to alleviate the users’ burden of learning about and understanding the data. Production Sustainability: it is addressed in terms of various degrees of commitment for and associated requirements on the product. Here, it is assumed that the commitment is backed by the necessary financial support. At maturity level 5, changes for technology in data production should be routinely incorporated into planning for continued, sustained stewardship. Data Quality Assurance (DQA): is a set of activities or procedures focused on defect prevention to be followed in order to ensure product quality during development. Data quality screening (DQS) is a set of activities intended to ensure the source data are clean. DQS is a commonly used procedure for identifying missing or redundant records. Data Quality Control/ Monitoring: is a set of activities taken to evaluate the product to ensure that it conforms to the required specifications. It is product-oriented and focuses on data anomaly detection. It is usually carried out after the product is created or at each major milestone of the product development and processing cycle. Data Quality Assessment: is a set of activities designed to ensure that the products are scientifically sound (i.e., building the right thing), by carefully evaluating the product, usually by comparison with similar well-established and validated observations or data product(s). Transparency /Traceability: The focus here is on the level of availability of information about the product and how it was created, the level of practices associated with management of documents, source code, and system information, and whether data and publication citations were tracked, such as by utilizing a Digital Object Identifier (DOI) system. Data Integrity: Data integrity refers to the validity of data, i.e., the accuracy and consistency of data over its entire lifecycle. The Data integrity component in this version of the stewardship maturity matrix primarily assesses the practices applied to datasets to ensure the data files are free of intentional or unintentional corruption during data transfer, ingest, storage, and dissemination and to ensure data authenticity at access. Level 5 is Level 4 + other things

Proposal for improvement of the GEOSS DMP IG using the Maturity Matrix Not exhaustive AREAS to be supported: DMP-1: Discoverability: e.g. Automatic Dissemination Reports and Future Technologies and standard changes planned DMP-6: Quality: e.g. Data uncertainty and reliability, Data Integrity DMP-9: Review and Reprocessing: e.g. introducing and improving the Sustainability concepts.

Open Points and Way Forward GEOSS Evolve initiative could be not more supported BUT GEO indicated interest on FAIR DMP RDA is fully involved on FAIR Data Maturity Model with a Working Group already fully active on this topic Next Steps: Verification of last version of DMP IG Finalise and circulate the GEOSS DMP IG Improvement proposal The relevant due date need to be defined after clarification of the Open Points