Screening for Abnormal Values in AirBase Datasets

Slides:



Advertisements
Similar presentations
S Larssen: PM-PP-Stockholm-Oct-2003.ppt slide 1 PM in Europe - State and past trends Emissions and concentration levels Steinar Larssen Norwegian Institute.
Advertisements

SECONDARY VALIDATION - RAINFALL DATA PRIMARY VALIDATION ALREADY DONE *ON INDIVIDUAL STATION BASIS SECONDARY VALIDATION *IDENTIFY SUSPECT VALUES BY HAVING.
Operational Quality Control in Helsinki Testbed Mesoscale Atmospheric Network Workshop University of Helsinki, 13 February 2007 Hannu Lahtela & Heikki.
Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.
Update of EEA´s Core Set Indicator th EIONET Workshop on Air Quality Bern, 30 th September 2014 Alberto González Ortiz Project Manager – AQ data.
Has EO found its customers? 1 Space Applications Institute Directorate General Joint Research Centre European Commission Ispra (VA), Italy
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
Working paper number WLTP-DHC Comparison of different European databases with respect to road category and time periods (on peak, off peak, weekend)
Environment 1 The current work on Air Quality Indicators Best needed “ Population exposure” vs. Best available “Population weighted concentrations” Ute.
Presentation of the Air Quality e-reporting User Interface (AQUI 1.0) Wim Mol Presentation AQUI 1.0 Dublin, Ireland October 2013 European Environment.
1 Monitoring and assessment in Europe Joining forces between EMEP and EEA Roel van Aalst 30 May 2001.
Analysis of station classification and network design INERIS (Laure Malherbe, Anthony Ung), NILU (Philipp Schneider), RIVM (Frank de Leeuw, Benno Jimmink)
Uncertainties in emission inventories Wilfried Winiwarter Joint TFEIP & TFMM workshop on uncertainties in emission inventories and atmospheric models Dublin,
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Advanced Tutorial on : Global offset and residual covariance ENVR 468 Prahlad Jat and Marc Serre.
Joint Research Centre the European Commission's in-house science service JRC Science Hub: ec.europa.eu/jrc 38th UNECE IWG PMP MEETING Non- exhaust particle.
Kick off meeting, 2008, Cavtat Guidance on the use of models for the European air quality directive An activity of WG1 FAIRMODE Bruce Denby 1*, Steinar.
Common Database on Designated Areas vs. INSPIRE Martin Tuchyňa, Darja Lihteneger INSPIRUJME SE, , Bratislava.
1 European Topic Centre on Air and Climate Change Air pollution by ozone in Europe in summer 2005 Preliminary results Libor Cernikovsky 10th EIONET Workshop.
Helpful hints for planning your Wednesday investigation.
E-PRTR incompleteness check Irene Olivares Industrial Pollution Group Air and Climate Change Programme Eionet NRC workshop on Industrial Pollution Copenhagen.
1 European air indicator reporting Process and experience © iStockphoto.com/cmisje.
Evaluation of pollution levels in urban areas of selected EMEP countries Alexey Gusev, Victor Shatalov Meteorological Synthesizing Centre - East.
1 Černikovský, Krejčí, Volná (ETC/ACM): Air pollution by ozone in Europe during the summer 2012 & comparison with previous years 17th EIONET Workshop on.
ST-MVL: Filling Missing Values in Geo-sensory Time Series Data
Joint EMEP/WGE meeting, Geneva, 2016 Evaluation of B[a]P pollution in the EMEP region: temporal trends and spatial variability Alexey Gusev, Olga Rozovskaya,
Meta-analysis Overview
Ann Mari Fjæraa Philipp Schneider Tove Svendby
Using satellite data and data fusion techniques
Database management system Data analytics system:
Implementing the Harmonica Index in the Dynamap project
Online Conditional Outlier Detection in Nonstationary Time Series
Urban Sensing Based on Human Mobility
SOCIAL NETWORK AS A VENUE OF PARTICIPATION AND SHARING AMONG TEENAGERS
The French National Reference laboratory (NRL)
Place Identification in Location Based Urban VANETs
The Calibration Process
Program Evaluation Essentials-- Part 2
4th Joint EU-OECD Workshop on BCS, Brussels, October 12-13
Analysing AQ monitoring networks
Measuring Data Quality and Compilation of Metadata
Gerald Dyer, Jr., MPH October 20, 2016
STATISTICAL AGENCY UNDER PRESIDENT OF THE REPUBLIC OF TAJIKISTAN
Anomaly Detection in Crowded Scenes
Air Quality Assessment and Management
A new way of looking at emission uncertainties
Rural Urban classification based on Grids following OECD Definition
The Statistics Canada population centre and rural area definition and the proposed European and Global version of the degree of urbanization: a short comparative.
IMPROVING PUBLIC INFORMATION
Bruce Rolstad Denby FAIRMODE 4th Plenary, Norrkjoping Sweden June 2011
Introduction- Link with WG E activity CMEP PLENARY MEETING-PRAGUE
The French National Reference laboratory (NRL)
„Aerem corrumpere non licet”
TFMM trend analysis: use of AirBase. Preliminary results
Uncertainties in emission inventories
PM observations in Europe a review of AirBase information
Tropospheric Ozone Assessment Report (TOAR)
On the validity of the incremental approach to calculate the impact of cities on air quality Philippe Thunis JRC- C5 TFMM - Geneva May 2018.
Summary: TFMM trends analysis
Assessing the environmental status in the Mediterranean Sea: a case-study in Saronikos Gulf to be extended to the regional sea Angel Borja (AZTI), Alexandra.
Advance HE Surveys Conference
Emissions What are the most sensitive parameters in emissions to improve model results (chemical species, spatio-temporal resolution, spatial distribution,
A handbook on validation methodology. Metrics.
Iakovos Barmpadimos, André Prévôt, Johannes Keller, Christoph Hüglin
Presentation transcript:

Screening for Abnormal Values in AirBase Datasets Oliver Kracht and Michel Gerboles European Commission - Joint Research Centre I – 21026 Ispra (VA) www.jrc.ec.europa.eu 18th EIONET Workshop on Air Quality Assessment and Management 24th and 25th October 2013 Dublin - Ireland

“Smooth Spatial Attribute Method” Objectives: Present a prototyped screening tool for abnormal values and uncertain classifications of ambient air quality monitoring stations  Methodology: “Smooth Spatial Attribute Method” (first developed for traffic sensors by Lu et al. 2003 & Shekhar et al. 2003)  Applications: AirBase records of daily PM10 values 22 February 2019

Data availability in Airbase: Public air quality database system of the European Environment Agency (EEA) Monitoring data submitted by about 35 participating countries throughout Europe 140 pollutants, more than 6000 stations and 25000 time series with hourly and daily data of more than 30 years 22 February 2019

Focus of this Exercise: records with varying time- extend from AirBase versions 4 and 7 daily PM10 values station type “Background” all area types (urban, suburban and rural – to be discussed) 22 February 2019

“Smooth Spatial Attribute Method” Proposed for traffic sensors by Lu et al. 2003 & Shekhar et al. 2003 1st quantify how the measurement value of a station deviates from the corresponding values observed within its spatio-temporal neighbourhood (the ‘Sx value’) 2nd compare this Sx-deviation to the corresponding Sx-deviations observed for the station’s neighbours Lu, CH.-T., D. Chen & Y. Kou, 2003: Detecting Spatial Outliers with Multiple Attributes. ICTAI'03, IEEE 2003. Shekhar, S., CH.-T. Lu & P. Zhang, 2003: A Unified Approach to Detecting Spatial Outliers. GeoInformatica, 7(2), 139-166. 22 February 2019

Definition of Neigbourhood in 3 Dimensions: spatial domain limited to +/- 1 spherical degrees temporal domain limited to +/- 2 days temporal domain is automatically expanded if initial neighbourhood is too little 22 February 2019

“Smooth Spatial Attribute Method” Calculation of Sx-values (for each individual neighbourhood) z-transformation of Sx using the mean and std of Sx (Sxn and sSxn) within a neigbourhood Define a reference basis θ (e.g., applying a KZ-filter to the individual zi timeseries). Test statistics for abnormal values screening (e.g., threshold value chosen to as 1.96) 22 February 2019

Example for spatio-temporal outlier screening: 1st step: log transformation of non-Gaussian data remark: AirBase v.4 nomenclature: AT0227A AirBase v.7 nomenclature: AT30104 22 February 2019

Calculate neighbourhood mean. 2nd step: Calculate neighbourhood mean. (weighted mean using inverse squared normalized Euclidian distance) 22 February 2019

Calculate Sx within individual neighbourhoods. 3rd step: Calculate Sx within individual neighbourhoods. 22 February 2019

4th step: For each station, calculate the weighted mean and weighted standard deviation of Sx values within its neighbourhood. (Sxn and sSxn) 22 February 2019

5th step: Sx values of the central station are Z-normalised (using the Sxn and sSxn of each neighbourhood). 22 February 2019

(e.g., θ +/- a predefined threshold of 1.96) 6th step: Test statistics for abnormal values searches for zi values exceeding the upper/lower limits chosen as a reference. (e.g., θ +/- a predefined threshold of 1.96) 22 February 2019

Threshold criteria applied in the outlier screening: Threshold reference Ө obtained from low pass filtering of individual stations zi time series. |zi| exceeding 1.96 not taken into account for computing Ө. Minimum number of data points required within a spatio-temporal neighbourhood (e.g., 20 neighbourhood points). Minimum number of data points required within a rolling window. Use a Kolmogorov-Zurbenko filter (with m = 5, k = 3) to obtain a smooth reference Ө. 22 February 2019

Threshold criteria applied in the outlier screening: …  Kolmogorov-Zurbenko filter (with m = 5, k = 3) to obtain a smooth reference Ө. Removes signal components with a periodicity of less than ca 8.7 days. 22 February 2019

Final Example Outcome 22 February 2019 remark: AirBase v.4 nomenclature: AT0227A AirBase v.7 nomenclature: AT30104 22 February 2019

Threshold criteria applied in the outlier screening: Threshold reference Ө obtained from low pass filtering of individual stations zi time series. |zi| exceeding 1.96 not taken into account for computing Ө. Minimum number of data points required within a spatio-temporal neighbourhood (e.g., 20 neighbourhood points). Minimum number of data points required within a rolling window. Use a Kolmogorov-Zurbenko filter (with m = 5, k = 3) to obtain a smooth reference Ө. non verifiable 22 February 2019

Systematic deviation from neighbourhood

Automated Data Processing All codes prototyped in the R environment Directly coupled to postgreSQL database 22 February 2019

Inherent challenges in the method: 22 February 2019

Limited availability of neighbourhood information Examples: availability of background station records Example 1: reasonably distributed spatial neighbourhood 22 February 2019

Limited availability of neighbourhood information Examples: availability of background station records Example 2: “asymmetric” spatial neighbourhood 22 February 2019

Limited availability of neighbourhood information Examples: availability of background station records Example 2: “asymmetric” spatial neighbourhood Consider investigating transboundary datasets. 22 February 2019

Example 3: changing neighbourhood over time 22 February 2019

Changing neighbourhood needs to be dynamically accounted for in the automated data processing (-> done). Maybe it is useful to flag a significant change of the group of stations within a neighbourhood which would explain sudden inset of abnormal station values (-> to do). 22 February 2019

Summary of Outcomes 2006 / 2007 records of AirBase v.4 22 February 2019

Some more aspects: Reprocessing with longer time series (AirBase v.7)

Influence of station-area type selections used urban, suburban and rural used urban and suburban only Sx mean and std of neighbourhood are changing, causing a change in the normalization and in the reference.

Preliminary Results and Conclusions Processed 2006 / 2007 AirBase records of daily PM10 values for a selection of 8 countries (AT, CZ, DE, ED, FR, GB, IT and NL). Content of identified abnormal datapoints typically ranges between 4% and 10% of the records within each individual country dataset. Number of non-verifiable records typically ranges between 1% and 15% per individual country (limitations of network design, e.g. to few neighbours). Figures about abnormal datapoints content are dependent on the parameter values chosen in the screening method. An absolute definition for abnormal records is not feasible, but depends on the intended objectives for using the method.

Preliminary Results and Conclusions Demonstrated extension to longer time series with AirBase v.7. We anticipate that the screening tools can be a useful AirBase post-processing tool for Modellers Preparation of data summaries Spatial and temporal trend analysis Statistical evaluations May also support QA/QC with a short feedback cycle for network operators when implemented in real or near to real time mode

Open Questions Is there a need to derive a harmonized set of screening tools parameters through collaboration? Or better leave this open to the end-user's choice? Adjustable parameter settings: spatial domain: +/- 1 spherical degrees temporal domain: +/- 2 days temporal domain automatically expanded if neighbourhood is too little test statistics: θ +/- predefined threshold of 1.96 thresholding reference Ө obtained from low pass filtering of zi time series: |zi| exceeding 1.96 not taken into account for computing Ө. Minimum number of data points within a spatio-temporal neighbourhood: 20 Minimum number of data points required within a rolling window. KZ filter with m = 5, k = 3

Open Questions Identify the circle of interested end-users (EOINET community, modelling community (FAIRMODE), EEA …). Is there a need to derive a harmonized set of screening tools parameters through collaboration? Or better leave this open to the user's choice? How to report results (graphs / tables / quantitative point per point information / simple flagging / aggregated statistics)? Structure for future implementations?

Thank you for your attention! 22 February 2019

Comparison with a conventional outlier screening approach Step 1: calculate neighbourhood standard deviation 22 February 2019

“Classical approach” Step 2: use the Z-score of log-transformed measurement values as an outlier criterion? 22 February 2019

Use the Z-score of log- transformed values as an outlier criterion? Z-score of log-transformed values does not provide a conclusive outlier criterion for this application. Spatial and temporal trend cannot be considered in this way. 22 February 2019