Using satellite data and data fusion techniques for air quality mapping (WP4000) Jan Horálek
1. Data fusion methodology 2. Description of WP4000 3. Data used and discussion on technical details
Linear regression model followed by kriging of its residuals (residual kriging) The supplementary data for linear regression model (LRM) selected based on their relation with measured AQ data. kriging – spatial interpolation geostatistical method (i.e. knowledge of the spatial structure of air quality field is utilized, using variogram) η(si) are the residuals of the LRM at the stations λ i are the weights estimated based on the variogram
Linear regression model followed by kriging of its residuals (residual kriging) – continuation variogram - measure of a spatial correlation parameters: sill, nugget, range Empirical variogram needs to be fitted by an analytical function, e.g. spherical. For some pollutants, both monitoring and modelling data may be logarithmically transformed, due to the lognormal distribution of these data.
Routine evaluation cross-validation – the spatial interpolation is calculated for every measurement point based on all available information except from the point in question. These estimated values are compared with the measured ones by scatter-plot (including R2 and regression equation) and by statistical indicators, espec. RMSE and bias (MPE). Occasionally also MAE and other ones. where Z(si) is the measured value in point si Ż(si) is the estimation in the point s i using other points N is the number of the stations
PM10 annual average 2010 – rural areas In-situ data EMEP model Linear regression model (log. transformed): adj. R2 SEE EMEP 0.33 0.324 EMEP, altitude 0.41 0.306 EMEP, altitude, wind speed 0.44 0.295
PM10 annual average 2010 – rural areas Rural map (applicable for rural areas only) cross-validation RMSE = 4.5 µg. m-3 Bias = 0.2 µg. m-3
PM10 annual average 2010 – urban areas In-situ data EMEP model Linear regression model (log. transformed): adj. R2 SEE EMEP 0.38 0.292
PM10 annual average 2010 – urban areas Urban map (applicable for urban areas only) cross-validation RMSE = 6.6 µg. m-3 Bias = -0.1 µg. m-3
PM10 annual average 2010 Final merged map
Analysis of the use of different dispersion models Outputs of different models, PM10, annual average 2009 Statistical indicators against in-situ data at rural stations Different results for different dispersion models
Analysis of the use of different dispersion models DF mapping using different models, PM10, annual average 2009, rural Statistical indicators using cross-validation at rural stations Similar (in fact no) bias for mapping using different dispersion models
1. Data fusion methodology 2. Description of WP4000 3. Data used and discussion on technical details
WP4000 overall description To examine, test and apply the use of data fusion techniques for combining satellite datasets related to air quality with ground‐level observations and chemical transport model information. Combination of three data inputs using data fusion: in-situ measurement data (highly accurate point- based observations, but large spatial gaps) chemical transport model data (spatially continuous data, but high uncertainties and, for some pollutants, bias) satellite data (near real time observations of spatial patterns)
Combination of in-situ, model and satellite data Source: Schneider et al. (2012). ETC/ACM Technical Paper 2012/9.
WP4000 overall description Work sub-packages: 4100 Implementation of the data flow 4200 Data fusion based on the historical data 4300 Data fusion based on the near real time data 4400 Web presentation of the maps Leading organization: CHMI Partner organizations: IDEA-ENVI (4100, 4400), NILU
WP4000 schedule
1. Data fusion methodology 2. Description of WP4000 3. Data used and discussion on technical details
In-situ monitoring data used Historical data: AirBase/e-reporting database (Europe) CAQR (Czech Republic) … Other national data?? Near real time data: AirBase/e-reporting database (Europe) Pollutants: PM10, PM2.5, NO2, SO2 Time step: annual (all), daily (all?), hourly (all?) Year(s): 2015 (Czech Republic only) 2014(? ) + ? – in coordination with other WPs
Modelling data used Models: CAMx (4.7 x 4.7 km, Czech Republic) – runs on HPC cluster 480cores (30 TB disk space) – coupled with ALADIN WRF-Chem (? x ? km/deg., major part of Europe) – Which data format? – Which time step? Hourly? Near real time data: How will be WRF-Chem data transmitted? Pollutants: PM10, PM2.5, NO2, SO2 Year(s): 2015 (Czech Republic only) 2014(? ) + ? – in coordination with other WPs
Satellite data used data used Satellite data: NO2, SO2 – Which satellite data? – Which data format??? – Which time step? – Which geographical coverage? PM2.5 ( + PM10 ?) Near real time data: How will be satelite data transmitted? Year(s): ? – in coordination with other WPs
Web presentation of the maps Historical data Near real time data
Thank you for your attention.