Michelle Gierach, PO.DAAC Project Scientist Eric Tauer, PO.DAAC Project System Engineer.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Portfolio Management, according to Office of Management and Budget (OMB) Circular A-16 Supplemental Guidance, is the coordination of Federal geospatial.
Product Quality and Documentation – Recent Developments H. K. Ramapriyan Assistant Project Manager ESDIS Project, Code 423, NASA GFSC
Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
Metrics Planning Group (MPG) Report to Plenary Clyde Brown ESDSWG Nov 3, 2011.
More CMM Part Two : Details.
Reviewing and Critiquing Research
May 17, Capabilities Description of a Rapid Prototyping Capability for Earth-Sun System Sciences RPC Project Team Mississippi State University.
DECO3008 Design Computing Preparatory Honours Research KCDCC Mike Rosenman Rm 279
Planning a measurement program What is a metrics plan? A metrics plan must describe the who, what, where, when, how, and why of metrics. It begins with.
SE 555 Software Requirements & Specification Requirements Validation.
Secure System Administration & Certification DITSCAP Manual (Chapter 6) Phase 4 Post Accreditation Stephen I. Khan Ted Chapman University of Tulsa Department.
MPARWG Deborah K Smith DISCOVER MEaSUREs Project Remote Sensing Systems.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
Release & Deployment ITIL Version 3
What is Business Analysis Planning & Monitoring?
S T A M © 2000, KPA Ltd. Software Trouble Assessment Matrix Software Trouble Assessment Matrix *This presentation is extracted from SOFTWARE PROCESS QUALITY:
Integrated Capability Maturity Model (CMMI)
COMPGZ07 Project Management Presentations Graham Collins, UCL
User Working Group 2013 Data Management System – Status 12 March 2013
N By: Md Rezaul Huda Reza n
For each of the Climate Literacy and Energy Literacy Principles, a dedicated page on the CLEAN website summarizes the relevant scientific concepts and.
“Knowing Revisited” And that’s how we can move toward really knowing something: Richard Feynman on the Scientific Method.
Software Quality Assurance Activities
Developing Climate Data Records (CDRs) from NPOESS Data Jeffrey L. Privette, John Bates, Tom Karl, Ed Kearns National Climatic Data Center (NCDC) NOAA.
Slide 1 D2.TCS.CL5.04. Subject Elements This unit comprises five Elements: 1.Define the need for tourism product research 2.Develop the research to be.
Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8, 2012.
Evaluation methods and tools (Focus on delivery mechanism) Jela Tvrdonova, 2014.
Introduction to Software Engineering LECTURE 2 By Umm-e-Laila 1Compiled by: Umm-e-Laila.
Software Engineering Lecture # 17
Service Transition & Planning Service Validation & Testing
Chapter 7 Developing a Core Knowledge Framework
United States Department of Agriculture Food Safety and Inspection Service 1 National Advisory Committee on Meat and Poultry Inspection August 8-9, 2007.
CHECKPOINTS OF THE PROCESS Three sequences of project checkpoints are used to synchronize stakeholder expectations throughout the lifecycle: 1)Major milestones,
University of Sunderland CIFM03Lecture 2 1 Quality Management of IT CIFM03 Lecture 2.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.
Chapter 7 Developing a Core Knowledge Framework
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
1 Standard Student Identification Method Jeanne Saunders Session 16.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
CEOS-CGMS Working Group on Climate John Bates NOAA/NESDIS/NCEI SIT Workshop Agenda Item 7 CEOS Action / Work Plan Reference CEOS SIT Technical Workshop.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
What Are the Characteristics of an Effective Portfolio? By Jay Barrett.
Purpose: The purpose of CMM Integration is to provide guidance for improving your organization’s processes and your ability to manage the development,
EBM --- Journal Reading Presenter :呂宥達 Date : 2005/10/27.
Improving Information Quality for Earth Science Data and Products – An Overview H. K. (Rama) Ramapriyan Science Systems and Applications, Inc. & NASA Goddard.
Software Engineering Modern Approaches Eric Braude and Michael Bernstein 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.
Onsite Quarterly Meeting SIPP PIPs June 13, 2012 Presenter: Christy Hormann, LMSW, CPHQ Project Leader-PIP Team.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Advances In Software Inspection
QA4EO in 10 Minutes! A presentation to the 10 th GHRSST Science Team Meeting.
The International Ocean Colour Coordinating Group International Network for Sensor Inter- comparison and Uncertainty assessment for Ocean Color Radiometry.
Committee on Earth Observation Satellites John Bates, NOAA Plenary Agenda Item 8 29 th CEOS Plenary Kyoto International Conference Center Kyoto, Japan.
Stages of Research and Development
NASA Earth Science Data Stewardship
Quality Assurance processes
Scientific Information Management Approaches Needed to Support Global Assessments: The USEPA's Experience and Information Resources Jeffrey B. Frithsen.
Development of Strategies for Census Data Dissemination
NSIDC DAAC Accessioning and “De-commissioning” Plans
Persistent Identifiers Implementation in EOSDIS
TechStambha PMP Certification Training
"Development of Strategies for Census Data Dissemination".
WGISS-WGCV Joint Session
Change Assurance Dashboard
Presentation transcript:

Michelle Gierach, PO.DAAC Project Scientist Eric Tauer, PO.DAAC Project System Engineer

We saw a theme spanning several 2011 UWG recommendations, (6, 14, 19, 20, 23). The theme spoke to a fundamental need/goal: Approach and handle datasets with consistency, and accept and/or deploy them only because it makes sense to do so. This is worth solving! We want to provide the right datasets, and we want users to be able to easily connect with the right datasets. We enthusiastically agree with the UWG recommendations! Therefore… Our intent is to capture the lifecycle policy, (including how we accept, handle, and characterize datasets), to ensure: Consistency in our approach, Soundness in our decisions, and the Availability of descriptive measures to our users.

In the next two discussions, we will address those 5 UWG Recommendations (6, 14, 19, 20, 23), via the following talks: 1. The proposed end-to-end lifecycle phases (enabling consistency), and assessment criteria (describing our data to our users) (Eric) 2. The results of the Gap Analysis, and the corresponding Dataset Assessment (Michelle)

Recommendation 6. Carry out the dataset gap analyses and create a reporting structure that categorizes what is available, what could be created, the potential costs involved, estimates of user needs, and other data management factors. This compilation should enable prioritization of efforts that will fill the most significant data voids. Recommendation 14. There needs to be a clear path for all datasets generated outside of PO.DAAC to be accepted and hosted by the PO.DAAC. The PSTs have a role in determining whether a dataset is valuable and of good quality. The processes and procedures should be published and readily available to potential dataset developers. All datasets should go through the same data acceptance path. A metric exclusively based on the number of peer-reviewed papers using the dataset is NOT recommended. Recommendation 19. The UWG has previously recommended that the PO.DAAC work on providing climatologies, anomalies, indices, and various dataset statistics for selected datasets. This does not include developing CDRs as part of the core PO.DAAC mission. This recommendation is repeated, because it could be partially complementary to the IPCC/CMIP5 efforts, e.g., these climatologists prefer to work with global monthly mean data fields. Contributions of CDR datasets to PO.DAAC from outside research should be considered. Recommendation 20. Better up front planning is required if NASA research program results are to be directed toward PO.DAAC. Datasets must meet format and metadata standards, and contribute to body of core data types. The Dataset Lifecycle Management plans are a framework for these decisions. Software must be designed to integrate with and beneficially augment the PO.DAAC systems. PO.DAAC should not accept orphan datasets or software projects. Recommendation 23. Guiding user to data: Explain and use Dataset Lifecycle Management vocabulary with appropriate linkages Clarify what Sort by 'Popularity...' means

The specification and documentation of the Dataset Lifecycle Policy stems from UWG Recommendations: 14, 20, 23 “There needs to be a clear path for all datasets generated outside of PO.DAAC to be accepted and hosted by the PO.DAAC” “All datasets should go through the same data acceptance path” “Better up front planning is required if NASA research program results are to be directed toward PO.DAAC” “Dataset Lifecycle Management plans are a framework for these decisions” “Explain and use Dataset Lifecycle Management vocabulary with appropriate linkages”

Consistency in our approach Match users to data Major Goal: Better describe our data to better map it to our users.

First: Define Lifecycle Phases to control consistency.

Dataset Lifecycle work underway internal and external to PO.DAAC. Internal: Significant research and work performed by Chris Finch (UWG 2010 Presentation) Work within PO.DAAC to streamline process; Mature teams with a very solid understanding of their roles Existing exit-criteria checklist for product release External: Quite a bit of reference available via Industry efforts and progress Models can be leveraged from implementations at other DAACs, Big Data, Data One Question: Any specific recommendations regarding lifecycle models appropriate to PO.DAAC?

Phase*Description 1Identify a Dataset of InterestThis phase controls the identification of a dataset and it’s submission as a candidate, performing some order of a cost/benefit analysis. 2Green-light the DatasetReview the information on a candidate dataset, indicating a go/no-go for inclusion at PO.DAAC 3Tailor the Dataset PolicySet the expectations with respect to the policy; identify areas for waiver or non- applicability, if any. 4Ingest the DatasetDetermine and verify the route to obtain data under this dataset; collection of data-set related meta-data. 5Archive the DatasetDetermine and verify how to process the data, identify reformatting needs, meta- data extraction and initial validation strategy. 6Register/Catalog the DatasetDo preparatory steps required to ultimately enroll this dataset into the catalogs and FTP site(s), etc. 7Distribute the Dataset(Was “Integrate”) Identify and complete work to tie the dataset in to search engines, visualization tools, and services. 8Verify the DatasetIdentify the test approach, plans, and procedures for verifying any and all PO.DAAC manipulation of this dataset’s granules. Define and document the level of verification to be performed. (Roll up all validation from prior steps.) 9Rollout the DatasetFinalize the dataset information page; review the dataset for readiness. Deploy to operations and notify community of availability. 10Maintain the DatasetThis phase controls actions that may be needed to maintain the data in-house over the longer term, including re-processing, superseding, and versioning. *Additionally, we include “Retire the Dataset”, but these are the primary operational phases.

Lifecycle Policy ESDIS Goals User Goals PO.DAAC Dataset Goals How we do business Procedures Consistent Approach How we Describe our Data MATURITY Controls…

Second: Define measurements related to Maturity Index.

We want to quantitatively evaluate our datasets We don’t want to claim datasets are “good” or “bad” NASA and NOAA call their evaluation: “Maturity.” Question: (Rhetorical, at this point) What does “maturity” mean to you? Do you prefer it to “Assessment and Characterization”? Assessment and Characterization? Maturity?

Over the lifecycle, various data points are collected Decisional (e.g., Uniqueness: Rare or hard-to-find data) Descriptive (e.g., Spatial Resolution) Those data points might control decisions or flow (exit criteria) and/or might be used to describe the “maturity” to the user. We think “maturity” means: A quantified characterization of dataset features. A higher number means more “mature”

Identify a Dataset of Interest Green-Light the Dataset Tailor the Dataset Policy Ingest the Dataset Archive the Dataset Register/Catalog the Dataset Distribute the Dataset Verify the Dataset Rollout the Dataset Maintain the Dataset Increasing Knowledge of Maturity - constant collection -

The creation of a PO.DAAC Dataset Maturity Model stems from UWG Recommendations: 6, 14, 20, 23 [Identify the] “potential costs involved, estimates of user needs, and other data management factors” “The PSTs have a role in determining whether a dataset is valuable and of good quality. The processes and procedures should be published and readily available to potential dataset developers” “A metric exclusively based on the number of peer-reviewed papers using the dataset is NOT recommended.” “Datasets must meet format and metadata standards” “PO.DAAC should not accept orphan datasets” “Clarify what Sort by 'Popularity...' means”

We adhere to the lifecycle for consistency, but a key outcome of the lifecycle must be maturity measures. Maturity Community Assessment:3 Technical Quality:4 Processing:3 Provenance:3 Documentation:5 Adherence to Process Guidelines:5 Toolkits:5 Relationships:4 Specification:4 Overarching Maturity Index:4

Beta Products intended to enable users to gain familiarity with the parameters and the data formats. Provisional Product was defined to facilitate data exploration and process studies that do not require rigorous validation. These data are partially validated and improvements are continuing; quality may not be optimal since validation and quality assurance are ongoing. Validated Products are high quality data that have been fully validated and quality checked, and that are deemed suitable for systematic studies such as climate change, as well as for shorter term, process studies. These are publication quality data with well-defined uncertainties, but they are also subject to continuing validation, quality assurance, and further improvements in subsequent versions. Users are expected to be familiar with quality summaries of all data before publication of results; when in doubt, contact the appropriate instrument team. Stage 1 Validation: Product accuracy is estimated using a small number of independent measurements obtained from selected locations and time periods and ground-truth/field program efforts. Stage 2 Validation: Product accuracy is estimated over a significant set of locations and time periods by comparison with reference in situ or other suitable reference data. Spatial and temporal consistency of the product and with similar products has been evaluated over globally representative locations and time periods. Results are published in the peer-reviewed literature. Stage 3 Validation: Product accuracy has been assessed. Uncertainties in the product and its associated structure are well quantified from comparison with reference in situ or other suitable reference data. Uncertainties are characterized in a statistically robust way over multiple locations and time periods representing global conditions. Spatial and temporal consistency of the product and with similar products has been evaluated over globally representative locations and periods. Results are published in the peer-reviewed literature. Stage 4 Validation: Validation results for stage 3 are systematically updated when new product versions are released and as the time-series expands.

MaturitySensor Use Algorithm Stability (including ancillary inputs) Metadata & QADocumentationValidationPublic Release Science and Applications 1Research Mission Sinificant changes likely IncompleteDraft ATBDMinimal Limited data availability to develop familiarity Little or none 2Research Mission Some changes expected Research grade (extensive) ATBD version 1+ Uncertainty estimated for select locations or times Data available but of unknown accuracy; caveats required for data use Limited or ongoing 3Research Mission Minimal change expected Research grade (extensive); meets international standards Public ATBD; peer- reviewed algorithm and product descriptions Uncertainty estimated over widely distributed times/locations by multiple investigators; differences understood Data available but of unknown accuracy; caveats required for data use Provisionally used in applications and assessments demonstrating positive value 4Operational Mission Minimal change expected Stable, Allows provenance tracking and reproducability; meets int'l standards Public ATBD; draft Operational Algorithm Description (OAD); peer-reviewed algorithm and product descriptions Uncertainty estimated over widely distributed times/locations by multiple investigators; differences understood Data available but of unknown accuracy; caveats required for data use Provisionally used in applications and assessments demonstrating positive value 5 All relevant research and operational missions; unified and coherent record demonstrated across different sensors Stable and reproducable Stable, Allows provenance tracking and reproducability; meeting int'l standards Public ATBD, OAD, and validation plan; peer- reviewed algorithm, product, and validaition articles Consistent uncertianties estimated over most environmental conditions by multiple investigators Multi-mission record is publicly available with associated uncertainty estimate Used in various published applications and assessments by different investigators 6 All relevant research and operational missions; unified and coherent record over complete series; record is considered scientifically irrefutable following extensive scrutiny Stable and reproducable; homogenous and published error budget Stable, Allows provenance tracking and reproducability; meeting int'l standards Product, algorithm, validation, processing, and metadata described in peer-reviewed literature. Observation strategy designed to reveal systematic errors through independent cross-checks, open inspection, and continuous interrogation Multi-mission record is publicly available from long term archive Used in various published applications and assessments by different investigators See: ftp://ftp.ncdc.noaa.gov/pub/data/sds/ms-privette-P1.3.conf.header.pdf

Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units? Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic structure? Sizing / volume expectation?

Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units? Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic structure? Sizing / volume expectation?

Maturity Community Assessment:3 Technical Quality:4 Processing:3 Provenance:3 Documentation:5 Adherence to Process Guidelines:5 Toolkits:5 Relationships:4 Specification:4 Overarching Maturity Index:4 Users would be presented with layers of information: Scores derived from the various criteria categories An ultimate maturity index (simple mathematical average) from the combined values: Ultimately could allow weighting At this point, seems it would overcomplicate ? ? Question: What does “maturity” mean to you? Do you prefer it to “Assessment and Characterization”? Does this provide better described datasets and better mapping of data to our users?

The lifecycle document, while capturing process, becomes a means to an even greater end. The driving current is consistency, and as our goals hinge on matching users to datasets, the lifecycle becomes the means to ensuring fully characterized datasets. We hope the approach is reasonable, and that we are accurate in our assessment that the policy aspects of the Dataset Lifecycle can and will help ensure conformity to process, and consistent availability of maturity data across all PO.DAAC holdings. Next steps: Need to ultimately identify (and if necessary, implement) the infrastructure needed to guide us through this lifecycle Still need to resolve some key questions, such as: How does the lifecycle morph with respect to different types of datasets? (Remote Datasets? Self-generated Datasets?)

Identify a Dataset of Interest Green-Light the Dataset Tailor the Dataset Policy Ingest the Dataset Archive the Dataset Register/Catalog the Dataset Distribute the Dataset Verify the Dataset Rollout the Dataset Maintain the Dataset Michelle’s discussion starts here…