Michelle Gierach, PO.DAAC Project Scientist 2012 PO.DAAC User Working Group (UWG) Meeting March 7-8, 2012
Rec. 6. Do dataset gap analysis and create a report. Rec. 19. PO.DAAC provide climatologies, anomalies, indices, and various dataset statistics for selected datasets. Status: A dataset gap analysis document was created that details datasets currently available and those that will soon be available in the ocean community. Available climatologies, anomalies, and other value-added products (e.g., fluxes, frontal gradients) were included in the document. Approx. 100 datasets were listed based upon input from the PO.DAAC User Working Group (UWG), Project Science Team (PST), and Data Engineers (DEs). Future Plans: Request for information twice a year from the UWG, PST, DEs, and NASA science teams regarding additional datasets. After this initial phase of acquiring available datasets, the next step in FY13 will be to see where gaps still exist and work with the community to create additional climatologies, anomalies, and indicies or create them ourselves within PO.DAAC.
Now that we have a document that lists ~100 datasets that would be of benefit to our users, how do we prioritize? Past prioritization has been subjective and ad-hoc. Need a system that is unbiased and provides quantitative measures to assess a dataset’s significance.
Identify a Dataset of Interest Green-Light the Dataset Tailor the Dataset Policy Ingest the Dataset Archive the Dataset Register/Catalog the Dataset Distribute the Dataset Verify the Dataset Rollout the Dataset Maintain the Dataset
Identify Datasets Obligated Significance Cost Analysis Remote Link Archive Comments/Thoughts/Questions? Reject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Recommendation
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 1 Step 1a Step 1b Recommendation Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 2 Recommendation Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 3 Recommendation Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 4 Recommendation Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 1 Step 1a Step 1b Recommendation
Approx. 100 datasets were identified within the oceanographic community. Seven of these were classified as PO.DAAC obligations, including: L2B reprocessed QuikSCAT data (JPL) L2C QuikSCAT data (JPL) MEaSUREs CCMP-like product (Bourassa) GHRSST Pathfinder 5.2 SST GHRSST Global Ocean Sea Surface Temperature Multi Product Ensemble (GMPE) GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly GHRSST Global Ocean OSTIA Sea Surface Temperature Anomaly Reanalysis First priority is given to datasets labeled as PO.DAAC obligations.
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Recommendation Non-Obligated Step 2
Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units? Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic data structure? Sizing / volume expectation? Comments/Thoughts/Questions?
Prioritization criteria to assess a non-obligated dataset’s significance: Source: A particular dataset’s association. PO.DAAC-centric NASA mission/project (1) Non-PO.DAAC-centric NASA mission/project (0.75) Domestic (non-NASA) mission/project (0.5) International mission/project (0.25) Uniqueness: Would this be a new and/or one-of-a-kind dataset to PO.DAAC? Yes/No (1/0) Desirability: Is there a need/want for this dataset in the community? High/Medium/Low (1/0.5/0) Maturity (1st order): Community recognition? Technical Quality? Dataset Specifics? High/Medium/Low (1/0.5/0)
Score = (Source_Score*25) + (Unique_Score*20) + (Desirability_Score*30) + (Maturity_Score*25) 4 Prioritization Groups: 1 st tier (green); 2 nd tier (yellow); 3 rd tier (orange); 4 th tier (pink) Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 3 Recommendation
Community Assessment: Papers written / number of citations # of Likes # of downloads/views Technical Quality: QQC+Latency / Gappiness Accuracy Sampling issues? Caveats/known issues identified? Processing: Has it been manipulated? Cal/Val state? Verification state? Provenance: Maturity of platform/instrument/sensor Maturity of Program Parent datasets identified (if applicable) Is the sensor fully described? Is the context of the reading(s) fully described? State-of-the-Art technology? Documentation: What is the state of the documentation? Is the documentation captured (archived)? Adherence to Process Guidelines Did it get fast-tracked? Tons of waivers? Were all exit criteria met satisfactorily? Consistent use of units? Access: Readily available? Foreign repository? Behind firewalls or open FTP? Toolkits: Data visualization routine? Data reader? Verified reader/subroutine? Relationships: Sibling/child datasets identified? Motivation/justification identified? Rarity: Hard-to-find data? Atypical sensor/resolution/etc.? Specification: Resolution (spatial / temporal) Spatial coverage Start time End time Data format? Exotic structure? Sizing / volume expectation? Comments/Thoughts/Questions?
Identify Datasets Obligated Significance Cost Analysis Remote Link ArchiveReject Dataset Dataset List Ranked Dataset List Dataset List Archive Dataset Classification Non-Obligated Step 4 Recommendation