Datzilla Usage NCDC Perspective Karsten Shein NOAA National Climatic Data Center NWS Sub-regional data stewardship workshop – Fargo, ND May 2007
2 What is Datzilla? Datzilla is a Web-based tool to report and track errors in NOAA-held data, metadata, or data delivery systems. It is maintained by the Southern Regional Climate Center and NCDC is a participant to the extent of addressing data it holds and systems it maintains.
3 Errors vs. Corrections Datzilla intended to correct errors Datzilla intended to correct errors Corrections to observations are not errors. Corrections to observations are not errors. For changes to recent observations (common) For changes to recent observations (common) – William Angel (COOP / ASOS) – Blake Lasher (LCD ASOS) – Send amended form to NCDC via normal channels. Changes to historical observations (uncommon) Changes to historical observations (uncommon) – Amended forms – Datzilla (limited to changing estimates of invalid obs) – Must have a compelling reason to change a historical obs. Corrections due to equipment problems may also require a metadata rendition. Corrections due to equipment problems may also require a metadata rendition.
4 When NCDC receives a Datzilla ticket … Datzilla gatekeeper Datzilla gatekeeper Initial determinations Initial determinations Reassignment Reassignment Investigation Investigation Course of action Course of action Resolution Resolution Closure Closure Your friendly Datzilla Gatekeeper
5 Some possible reasons for a Datzilla ticket to NCDC Data issue (usually) Data issue (usually) – That TMAX of 74 should be a 47! – Your QC clobbered my data! System issue System issue – CDO inventory doesn’t match the data I got! Metadata issue Metadata issue – This station’s COOP number is wrong!
6 Good tickets and Bad Pre-report investigation (critical) Pre-report investigation (critical) – Is the error really an error? – Is it the right station? – Is it reproducible (browsers, systems) Do you have the right system? Do you have the right system? – ACIS, ThreadEx, NCDC data may be different Have you proposed a solution? Have you proposed a solution?
7 A good ticket Clear title Clear title Good description of problem Good description of problem ID of station ID of station Attached form Attached form Proposed solution. Proposed solution.
8 A bad ticket Vague title Vague title Vague description of problem Vague description of problem No ID of specific station, element, date No ID of specific station, element, date No supporting material No supporting material No suggested fix No suggested fix Hunches Hunches Data “corrections” Data “corrections”
9 Investigating the error xmACIS is a display tool for NOAA data xmACIS is a display tool for NOAA data – Interprets data flags from NCDC data sets – Prioritizes simultaneous values from different data sets – Displays “M” for invalid as well as missing data – May thread data from several stations Primary, Secondary, Tertiary forms: Primary, Secondary, Tertiary forms: – 1001, B-91, 10-A, … – 1014, LCD, CD, CRB – Media articles, eye-witness accounts, proxy data Paper, WSSRD, IPS … Paper, WSSRD, IPS …
10 My ticket’s been open forever! Some are easy to fix Some are easy to fix – Small-batch keying errors – A few missing values Some are not Some are not – May be part of a larger issue – May exceed NCDC resources – May not have a straightforward fix – May involve parties/systems outside NCDC
11 Your QC clobbered my data! You are right to alert us. You are right to alert us. However, before you do … However, before you do … – Think about why our QC would flag that data. – Most often, the problem is not our QC Observer date shifting Observer date shifting Observing at time different than expected. Observing at time different than expected. – Known climate anomalies? – Unreported temporary anomalies?
12 Should that 850° F TMAX really be invalidated? Fuel for the Fire
13 One NOAA We’re all interested in having the best NOAA data possible. We’re all interested in having the best NOAA data possible. Better data means more accurate forecasts and research. Better data means more accurate forecasts and research. NCDC has limits on what it can do to data. NCDC has limits on what it can do to data. We will address and work to fix all legitimate errors. We will address and work to fix all legitimate errors.
14 Any questions? Once again, your friendly (no, not that friendly) NCDC Datzilla Gatekeeper! NCDC Halloween, 2006