Validation of WStatR-Data

Slides:



Advertisements
Similar presentations
Measures of Dispersion boxplots. RANGE difference between highest and lowest value; gives us some idea of how much variation there is in the categories.
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
UNSD/UNEP data collection on waste DI Milla Neubauer, Qatar 19 June
Eurostat The ESS.VIP Validation and its implementation in waste statistics Q2014 – Session 13 4 June 2014 Hartmut Schrör, Eurostat.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
BCOR 1020 Business Statistics
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Climate Change Committee WG1 QA/QC procedures and – programme for the EC inventory process André Jol, EEA 2 September 2004.
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
BOX PLOTS/QUARTILES. QUARTILES: 3 points in a set of data that separate the set into 4 equal parts. Lower Quartile: Q1 (The median for the lower half.
Quartiles & Extremes (displayed in a Box-and-Whisker Plot) Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme Back.
France : Improving checks in customs data OCDE – 7 November 2011.
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
Review Measures of central tendency
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Valentina Stoevska ILO Department of Statistics Workshop on MDG Data Reconciliation: Employment Indicators, Beirut, July
United Nations Economic Commission for Europe Statistical Division UNECE Workshop on Consumer Price Indices Istanbul, Turkey,10-13 October 2011 Session.
1 Further Maths Chapter 2 Summarising Numerical Data.
Numerical Measures of Variability
European Conference on Quality in Official Statistics 8-11 July 2008 Mr. Hing-Wang Fung Census and Statistics Department Hong Kong, China (
Represent sets of data using different visual displays.
Expert Group Meeting on MDG, Astana, 5-8 Oct.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector Sources of discrepancies between.
E-PRTR incompleteness check Irene Olivares Industrial Pollution Group Air and Climate Change Programme Eionet NRC workshop on Industrial Pollution Copenhagen.
Box and Whisker Plots Example: Comparing two samples.
1 This presentation has been prepared by Jørn Kristian Undelstvedt, Gisle Berge and Håkon Skullerud Pilot study on import and export of waste Overview.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
Making a Box & Whiskers Plot Give Me Five!. 5 Numbers are Needed 1) Lowest: Least number of the data set 2) Lower Quartile : The median of the lower half.
1 EUROPEAN TOPIC CENTRE ON WATER EUROWATERNET Towards an Index of Quality of the National Data in Waterbase.
5,8,12,15,15,18,20,20,20,30,35,40, Drawing a Dot plot.
Chapter 16: Exploratory data analysis: numerical summaries
a graphical presentation of the five-number summary of data
Discussion: Timely estimates of economic indicators – Session C3 –
Carsten Boldsen Hansen Economic Statistics Section, UNECE
DS5 CEC Interpreting Sets of Data
Cross-validation of waste statistics data
Working Group ”Environmental Accounts”
Inference.
Improvement of Austrian Waste Management Data for OECD/EUROSTAT Joint Questionnaire and Waste Statistics Regulation Brigitte Karigl / Federal Environment.
Calculation of waste management indicators
Structural Business Statistics Data validation
Resource Adequacy Demand Forecast Coincidence Adjustments
Results of the Survey on Mining and Quarrying Waste Statistics
ESTP-Waste Statistics
ESTP – Training on waste statistics, 6th/7th December 2016
Estimation techniques for missing intra-EU trade
EW-MFA Training Workshop –
How to create a Box and Whisker Plot
5.1 Environmental Data Centre on Waste Progress report
Work on the coherence of data-flows / improving data-quality
Mining and quarrying activities and waste generation
Streamlining of Environmental Indicators Agenda item 5
with emphasis on data comparability
Prodcom ESTP course October 2010
Municipal Waste item 6.1 of the agenda
Provisions of totals and zero records (point for decision)
National Water Management Authority
Define the following words in your own definition
ESTP COURSE ON ECONOMIC AND SOCIAL CLASSIFICATIONS Introductory course Day 2 – second afternoon session PRODCOM Marie-Madeleine Fuger INSEE – France.
Challenges of Resource Efficiency
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
Data processing German foreign trade statistics
. . Box and Whisker Measures of Variation Measures of Variation 8 12
Box-And-Whisker Plots
Box and Whisker Plots.
5 Number Summaries.
Statistics Vocab Notes
Carbon Leakage List – Methodology for the Quantitative Assessment
Physical Energy Flow Accounts - Example Austria
Data Validation practice in Statistics Lithuania
Internal audits in the CZSO and their impact on National Accounts
Presentation transcript:

Validation of WStatR-Data Jürgen Gonser ARGUS, Berlin ESTP – Training on waste statistics, 24th/25th April 2012

What is Data Validation? Data validation shall ensure the correspondence of the final (published) data with a number of quality characteristics, in particular the accuracy, coherence and comparability of the data. Data validation encompasses: establishing a set of checking / validation rules; detecting outliers or potential errors; communicate the detected problems to the “actor” in the best position to investigate about the anomaly. ESTP – Training on waste statistics, 24th/25th April 2012

Structure or Presentation Overview of validation process Types of validation checks Validation checks for waste generation Validation checks for waste treatment Compilation of clarification requests Outlook ESTP – Training on waste statistics, 24th/25th April 2012

Clarification request Validation Process for WStatR-Data Technical check during data upload:  completeness, consistency of totals Evaluation report Quick evaluation (within 2 months)  internal coherence, development over time at a very aggregate level Country replies Clarification request Validation  analysis of time series, cross-country comparison, cross-checks with other data Country replies ESTP – Training on waste statistics, 24th/25th April 2012

Validation Checks Automatic checks based on: established checking rules using defined thresholds implemented in a MS Access database “Visual” checks: needed for selection of relevant potential errors assessment of SDI indicators (non-mineral waste, hazardous waste) aspects not covered by automatic checks (e.g. waste related view on the data, …) Cross-checks with other waste data ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Validation Checks Checks used for the validation of waste generation data: Comparison with previous year Comparison across countries Ranking of waste categories by activities Cross-checks with other data sets (WSR, ELV, WEEE) ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Comparison over time ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Cross-Country Comparison Comparison with other countries / identification of assumed outliers is done on the basis of the interquartile range. Interquartile range (IQR): common measure for statistical variation difference between 25%-quartile (Q1) and 75%-quartile (Q3) Outliers = values that deviate from the upper or lower quartiles by more than 1.5 interquartile ranges Advantage: Validation thresholds reflects sector-specific waste intensity and variation of waste generation across countries ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Cross-Country Comparison Box-Whisker-Plot: 50% of all values lie between Q1 and Q3 (within the IQR) The upper whisker represents the highest value still within 1.5 IQR The lower whisker represents the lowest value still within 1.5 IQR ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Cross-Country Comparison ESTP – Training on waste statistics, 24th/25th April 2012

Waste Generation: Cross-Country Comparison Hazardous waste total / Gross value added: Distribution by sectors petroleum industry metal industry chemical industry ESTP – Training on waste statistics, 24th/25th April 2012

Generation: Ranking of waste categories Waste categories are ranked according to the generated amounts: for each sector and for each country Result: Sector-specific waste generation profiles The profile shows which waste categories: are usually most important in the sector; are uncommon for the sector (potential errors) Sector-profiles are compiled separately for hazardous and for non-hazardous waste (in 1000 tonnes and in kg/inhabitant) ESTP – Training on waste statistics, 24th/25th April 2012

Generation: Ranking of waste categories Non-hazardous waste reported by all countries in NACE F in 2008 by waste categories (kg per inhabitant): ESTP – Training on waste statistics, 24th/25th April 2012

Waste Treatment: Validation Checks: Checks used for the validation of waste treatment data: Comparison with previous year Comparison with generated amounts Comparison with treatment capacities (for incineration only) Cross-checks with other data sets (packaging waste) ESTP – Training on waste statistics, 24th/25th April 2012

Waste Treatment: Comparison over Time Indicator: Share of amount treated compared with previous year Comparison is carried out for: the treated total of all treatment categories; the treated total of each of the 5 treatment categories. Thresholds: lower threshold: 80% compared to previous year upper threshold: 120% compared to previous year ESTP – Training on waste statistics, 24th/25th April 2012

Waste Treatment: Comparison with Generation Assumption: Treated total similar to generated total Two approaches: Share of treated total compared to generated total for total waste (nhaz and haz) for total hazardous waste Same as 1. but corrected for imports and exports with data on waste shipments (Basel) possible for hazardous waste only Thresholds: lower threshold: 80% of generated amount upper threshold: 115% of generated amount ESTP – Training on waste statistics, 24th/25th April 2012

Waste Treatment: Comparison with Capacity Assumption: Treated total equal or lower than available capacity Test applied to total amount (nhaz and haz) treated by: Incineration with energy recovery Incineration without energy recovery Threshold: upper threshold: waste treated amounts to 100% of the reported treatment capacity lower threshold: not applied ESTP – Training on waste statistics, 24th/25th April 2012

Clarification Requests Search for explanations for identified “errors”: Requests for clarification List of identified potential errors Elimination of insignificant “errors” Quality reports Results of quick evaluation Results of previous validation (1022 “errors” summarised in 476 questions) (Result: 1874 “errors”) (Result: 1206 “errors”) (Result: 1022 “errors”) ESTP – Training on waste statistics, 24th/25th April 2012

Clarification Requests ESTP – Training on waste statistics, 24th/25th April 2012

Outlook Changes from WStatR revision have to be incorporated into validation: Enhances the possibilities for validation (e.g. comparison of generation and treatment by selected waste categories, ..) Envisaged improvements: More focus on tests related to waste categories Tests for validation of time series shall be improved More attention to low values, missing values, zero-values ESTP – Training on waste statistics, 24th/25th April 2012

Outlook After this presentation you can anticipate some of the requests for clarification you will most likely receive.  We are looking forward to finding the respective explanations in the quality reports! ESTP – Training on waste statistics, 24th/25th April 2012