Documenting Data Quality Ted Habermann, NOAA/NESDIS/NGDC Please view these slides as a slide show. High-quality documentation serves many roles. The contemporaneous emergence of metadata standards and the World Wide Web during the mid-1990's focused significant attention on the discovery role. International collaboration, the Open Archival Information System (OAIS) Reference Model, increased data transparency, and public scrutiny of all climate data and interpretations have recently brought focus back to the importance of the role of documentation in enabling independent understanding of data. This Figure shows the global average of a parameter calculated from a NESDIS Satellite Product between 2002 and 2006. There is an obvious 50% increase in this parameter during late 2002. Such an abrupt large change would not be expected over the whole globe. Why did this happen? The text box reflects the current state of the documentation for this dataset. The scientist responsible for the product checked their e-mail archive and came up with a very vague explanation of the change from indirect references in the e-mail. The final statement "hopefully this settles the issue.." may be sufficient in the scientific community of experts that know these data, but it is unlikely to satisfy non-experts that question the integrity of scientists and the scientific process. Also, of course, e-mail archives and personal recollections are impossible to reliably preserve. This example is, unfortunately, not an exception to the rule. Documentation: It’s not just discovery... 1
DQ_Scope <<CodeList>> MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile <<DataType>> DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : MD_ScopeDescription <<Union>> MD_ScopeDescription + attributes : Set<GF_AttributeType> + features : Set<GF_FeatureType> + featureInstances : Set<GF_FeatureType> + attributeInstances : Set<GF_AttributeType> + dataset : CharacterString + other : CharacterString
DQ_Result DQ_Result + resultScope: DQ_Scope [0..1] DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record DQ_DescriptiveResult + statement: CharacterString QE_CoverageResult + resultFile : MX_DataFile + resultFormat: MD_Format + resultContentDescription: MD_CoverageDescription + resultSpatialRepresentation: MD_SpatialRepresentation + spatialRepresentationType : MD_SpatialRepresentationTypeCode
Measure Registry / Database DQ_MeasureReference + measureIdentification: MD_Identifier [0..1] + nameOfMeasure: CharacterString [0..*] + measureDescription: CharacterString [0..1] Quality Measure measure identifier name alias element name basic measure definition description parameter value type value structure source reference example <<DataType>> MD_Identifier + authority [0..1] : CI_Citation + code : CharacterString + codeSpace [0..1] : CharacterString + version [0..1] : CharacterString
Data Quality - Granules
Data Quality - Standards LI_Lineage <<Union>> MD_ScopeDescription + attributes : Set<GF_AttributeType> + features : Set<GF_FeatureType> + featureInstances : Set<GF_FeatureType> + attributeInstances : Set<GF_AttributeType> + dataset : CharacterString + other : CharacterString MI_Metadata DQ_DataQuality + scope : DQ_Scope DQ_StandaloneReportInformation + reportReference : CI_Citation + abstract: CharacterString + standAloneReport 0..1 + report 0..* <<Abstract>> DQ_Element <<DataType>> DQ_Scope + level : MD_ScopeCode + extent [0..1] : EX_Extent + levelDescription [0..*] : <<CodeList>> MD_EvaluationMethodTypeCode + directInternal + directExternal + indirect DQ_MeasureReference DQ_Evaluation DQ_Result + resultScope: DQ_Scope [0..1] DQ_DescriptiveResult DQ_CoverageResult <<CodeList>> MD_ScopeCode + attribute + feature + attributeType + featureType + collectionHardware + propertyType + collectionSession + fieldSession + dataset + software + series + service + nonGeographicDataset + model + dimensionGroup + tile DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + errorStatistic [0..1] : CharacterString + value [1..*] : Record DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean
Community - the Wiki