Top air quality import headache Dataflows: D/E1a/E2a

Top air quality import headache Dataflows: D/E1a/E2a
Peter Kjeld, IPR April 2017 Top air quality import headache Dataflows: D/E1a/E2a What is the typical reasons when data don’t get processed

No. 11 – XML not valid Cause: XML not well-formed or fails to validate against schema Result: XML is not imported Notes: D and E1a are blocked in CDR E2a fails in processing Solution: Fix XML

No. 10 – Dataflow E do not match D
Cause: Observation constellations provided in E do not match what is defined in the metadata section (D) Typical reason: Localid of SamplingProcess has changed Result: Datavalues from E are imported and metadata are created on-the-fly, though flagged as ‘Undelivered’ and ‘Created-on-the-fly’ Note: Before a new D is imported, all existing metadata in db is flagged (undelivered) and unflagged as D is being imported. After D is imported some metadata might still be flagged. They will stay flagged until a new D is delivered which fix the problem. Note: The ‘Undelivered’ and ‘Created-on-the-fly’ flag solves the D/E timing problem

No. 9 – Timing problem D/E Cause: E dataset processed before the matching D is processed Typical reason: Yearly delivery (September) covers most dataflows, and are released at the same time Result: Dataset E can end up being processed before the D and would not be imported because the metadata are not recognized. Solution: All data from E are imported and if metadata is not recognized, they will be partially created-on-the-fly based on the information found i E. Metadata ‘created-on-the-fly’ will be flagged and not be used until a D dataset unflags it.

No. 8 – From- and to-datetime mismatch value quantity
Cause: The quantity of Value, e.g. ‘/hour’ do not match what is delivered in the swe:values array Example: Data should be hourly but only reported as 55 minute average Result: When data obviously are hourly the quantity is changed otherwise the data are imported as ‘var’ and not used in most aggregations

No. 7 – Daily data always from midnight
Cause: The quantity of Value is set to ‘/day’ and in the swe:values array the from- and to-datetime includes hour information Result: The datetime is rounded to midnight and timezone off-set is removed

No. 6 – Validity flag Cause: Data are flaged as ‘Not valid’ (-1 or -99) Result: Data flagged as ‘Not valid’ (-1 or -99) are not used in any aggregations/statistics and are not disseminated Note: E2a data would normally be flagged as ‘Valid’ (1) and ‘Not verified’ (3) E1a data would normally be flagged as ‘Valid’ (1) and ‘Verified’ (1)

No. 5 – Validity / Verification flag
Cause: Value of validity and/or verification flag is ‘0’ Result: ‘0’ is not a valid value of Validity or Verification Validity = ‘0’ changed to ‘-1’ = ‘Not valid’ Verification = ‘0’ changed to ‘3’ = ‘Not verified’ A warning is written to the logfile Consequence: If data are stored as ‘Not valid’ they will not be included in calculations of aggregation/statistics and not disseminated further.

No. 4 – Order or fields Cause: The order of fields defined have to match what is delivered Result: If the order is not respected data might be wrongly imported

No. 3 – UTD (E2a) data not in map
Cause 1: Data flagged as ‘Not valid’ are excluded from maps, aggregations and further dissemination Cause 2: Datacoverage < 70% (daily) Best pratice: ‘Valid’ (1) and ‘Not verified’ (3) UTD data should be flagged as ‘Valid’ (1) and any of the verification flags; ‘Verified’, ‘Preliminary verified’ or ‘Not verified’. As UTD has normally not gone through a verification process, they would be flagged as ‘Not verified’

No. 2 – Data overlap – internal xml
Cause: Inconsistency in data within the XML file Multiple values for same samplingpoint/pollutant/time Result: Data are not imported

No. 1 – Data overlap – database
Cause: Before data is inserted into the database, we check that this is not causing overlap of data inserted previously. Check: Overlaps are checked on SamplingPoint + Pollutant level Result: If an overlap is identified, new data will overwrite old data. A warning is written to the log Note: Gap in the result because of ‘intersect’ rule Existing data (db):: New data:: Result (db)::

Observation constellation – different approach
Before: Primary key constallation: SamplingPoint, Pollutant, FeatureOfInterest, Process Problem: Process has caused a lot of errors in the past FeatureOfInterest is not used Now: Primary key: SamplingPoint, Pollutant More clear to identify and avoid data overlaps Consistency between dataload and aggregations/statistics

Top air quality import headache Dataflows: D/E1a/E2a

Similar presentations

Presentation on theme: "Top air quality import headache Dataflows: D/E1a/E2a"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Top air quality import headache Dataflows: D/E1a/E2a

Similar presentations

Presentation on theme: "Top air quality import headache Dataflows: D/E1a/E2a"— Presentation transcript:

Similar presentations

About project

Feedback