Automatic checks: current & future EEA-ETC/ACM | 25th of April 2017 | 3rd IPR Technical meeting, Copenhagen Automatic checks: current & future
Automatic checks
Why is QA/QC important? Still crucial for Country EEA (both IT & content) Commission
What’s new? Improve check/error categories On-going revision of checks (keys checks) Revision of unnecessary checks following schema update Where possible move Tier 2 checks into automatic Tier 1
New check categories Categories: BLOCKER ERROR WARNING INFO OK SKIPPED On-going revision of checks (keys checks) Revision of unnecessary checks following schema update Where possible move Tier 2 checks into automatic Tier 1
New check categories BLOCKER: A critical error. The envelope can not be released. Normally, a blocker is an error in the format of the file, or in the structure or content of the data. Such a critical error makes it impossible for the delivery to be harvested and integrated into the database. The envelope can only be released if every incorrect file is removed and replaced by corrected files.
New check categories ERROR: An important error. The envelope can be released, but part of its content may be excluded from the European database Data Reporters are strongly advised to correct errors. A clarification or a resubmission WILL be requested, when the data is processed and the final feedback is added to the envelope.
New check categories WARNING : An issue that may be an error. Data Reporters are advised to check the correctness of the records or values that raised the warning. The envelope can be released. Data Reporters are strongly advised to check warnings. A clarification or a resubmission MAY be requested, when the data is processed and the final feedback is added to the envelope.
New check categories INFO: Other issues related to the quality of the data or content of the submission. The envelope can be released.
New check categories OK: The automatic QC did not detect quality issues. The envelope can be released.
New check categories SKIPPED: Data check has been executed, but there was "Nothing found to check", typically because of missing optional data
On-going improvements on QA Ensure QA works 100% Improve user-friendly text Add/implement checks on initial content Check namespace is consistent with country
On-going improvements on QA Improve cross-checks (year specific) Downgrade any unnecessary check (to warning) Upgrade critical issues (to blockers) Add checks for mandatory elements (if missing)
On-going improvements on QA ETC/ACM is working with EEA to improve QA following: Internal consistency checks Feedback report to countries Reported issues by countries (via helpdesk) Improvement on Documentation
On-going improvements on QA
Checks on initial content Initial checks on content are done on ALL data flows: Verification of update/new delivery Compare localId vs existing in system. Specially for: Zone localIc Assessment Regime if updating report Fixed measurement localId (NET, STA, SPO, SPP, SAM…) Attainment if updating report Improvement are required following BUGS during in 2017
Check consistent namespace Check correctness of Namespaces New checks are: B10.1 C6.1 D7.1, D16.1, D32.1, D55.1 & D72.1 M7.1, M28.1 & M41.1 G9.1 Managed via codelist: http://dd.eionet.europa.eu/vocabulary/aq/namespace/view
Improve cross-checks (year specific) Cross-checks are year specific (when feasible) Examples: C cross-checks of zones will only related to latest delivery in B for same reporting year G cross-checks of assessment regimes will related to same reporting year This has improved reports and avoid some wrong errors
B8 check & counts combination B8 to be splitted into 3 Improve on dataset B B8 check & counts combination B8 to be splitted into 3 B8a B8b check Area coverage B8c check Population sum
B23 correct Latitude/Longitude Improve on dataset B B23 correct Latitude/Longitude Turn to warning or add an exception for France B39a B39b B39c – checks pollutant/target Initial check to ensure completeness Minimum 1 combination No duplication
C8 – mandatory pollutants Improve on dataset C C8 – mandatory pollutants At least 1 assessment regime - blocker C9 – pollutants with Monitoring Objective Turn into positive message C10 – C18 has become C10 Correct pollutant/objective/target/metric
Improve on dataset C C31 - Important comparison between B & C
C31 - Important comparison between B & C Some bugs detected for Improve on dataset C C31 - Important comparison between B & C Some bugs detected for B_preliminary & C_preliminary
Improve on dataset C C41 new C42 new Check classification Date of classification Report C42 new check Classification Report is included
Improve on dataset D Initial key check on localId It needs evaluation to ensure it captures wrong updates of localId(s)
D41 to D44 internal crosschecks within D Improve on dataset D D41 to D44 internal crosschecks within D All information MUST be provided within same XML Will become a blocker D52 checks if AQD_Used = True it is used in C was moved to C
D67 to D70 information on equivalence, QA/QC report… Improve on dataset D D67 to D70 information on equivalence, QA/QC report… all left to warnings D78 inlet height left as a warning
Full QA was implemented for E1a end 2016 Improve on E1a Full QA was implemented for E1a end 2016 Automatic QA attempts to catch errors/bugs before processing data More by Peter Kjeld (EEA)
Improve on E1a Specially for consistency on Observing Capability combination (SPO-SPP-SAM-Pollutant) Correct codelist
Following current feedback process, what’s new? Improve on E1a – LATEST!! Following current feedback process, what’s new? important check to prevent data processing errors
Improve on E1a – LATEST!! Hydrogen sulphide (air)
Investigating if some statistical checks could be done on CDR Improve on E1a E30 to take into account country specific ranges as previous checks in DEM Investigating if some statistical checks could be done on CDR Negative/Zero annual averages, unexpected range…
Improve on dataset G New checks Ensure that all Assessment Regimes in C are included in G (with some exceptions) Complete delivery versus B and C delivery Some keys checks are still warnings (pay attention)
Improve on dataset G
Improve on dataset G
After automatic checks, other important checks are performed Further tier-2 checks After automatic checks, other important checks are performed Tier-2 checks on geometries (B) Tier-2 checks on E1a Process & correct upload to database
Improve on E2a – under development - reminder https://tableau.discomap.eea.europa.eu/t/Aironline/views/QAQC_E2a_FINALv1_clean/Outlierdetectionmap?:embed=y&:showShareOptions=true&:display_count=no&:showVizHome=no
Improve on E2a – under development
Improve on E2a – under development
Improve on E2a – under development
Any Questions? EEA-ETC/ACM contact: Jaume Targa (4sfera Innova) Principal consultant m. +34 679 38 01 01 e. jaume.targa@4sfera.com skype jaume.targa.4sfera twitter @4sfera web www.4sfera.com You can follow us on twitter @4sfera