Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.

Slides:



Advertisements
Similar presentations
Aggregate Data Research Methods. Collecting and Preparing Quantitative Data Where does a researcher find data for analysis and interpretation? Existing.
Advertisements

Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Midterm Review Evaluation & Research Concepts Proposals & Research Design Measurement Sampling Survey methods.
1 Introduction to Software Engineering Lecture 42 – Communication Skills.
Continuous improvement of EU-SILC quality: standard error estimation and new quality reporting system Emilio Di Meglio and Emanuela Di Falco (EUROSTAT)
The Islamic University of Gaza
Formalizing the Concepts: Simple Random Sampling.
The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK.
Regional Coordinators Meeting September 28-30, 2009 Washington DC Data Quality Assessment Framework for the ICP and Checklist Nada Hamadeh Statistics Officer.
Giovanna Brancato, Marina Signore Istat Work Session on Statistical Metadata (METIS) Metadata and Quality Indicators Reuse for Quality reporting Geneva,
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Environmental Statistics of Jordan Department of Statistics Jordan Munther Badriyah Head of Environment Statistics Division 20-22/10/-2010 Santiago - Chili.
TYPOLOGY OF PRODUCTS IN OFFICIAL STATISTICS Thomas Burg Marcus Hudec.
Quality assurance activities at EUROSTAT CCSA Conference Helsinki, 6-7 May 2010 Martina Hahn, Eurostat.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
REFERENCE METADATA FOR DATA TEMPLATE Ales Capek EUROSTAT.
1.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 2 Labor Market Information Produced in Collaboration between World Bank Institute and the.
Quality framework for the evaluation of administrative data (to be used for statistics) Piet J.H. Daas, Judit Arends-Tóth, Barry Schouten and Léander Kuivenhoven.
Chapter 9: Data quality and metadata Ilaria DiMatteo United Nations Statistics Division The 4 th meeting of the Oslo Group on energy statistics Ottawa,
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Data Quality & dissemination D. Sahoo Dy. Director General Central Statistical Organization, India.
provide information Best Practice Template Experiences in Austria Wolfgang Bittermann Directorate Spatial Statistics Canberra 2 May.
for statistics based on multiple sources
Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions Geneva,
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Compilation of Meta Data Presentation to OG6 Canberra, Australia May 2011.
1 Panel debate IRES-ESCM and SEEA Olav Ljones Oslo Group Canberra, May
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
1Your reference The Menu of Indicators and the Core Set from the South African Point of View Moses Mnyaka 13/08/2009.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 1 Quality management Produced in Collaboration between.
First meeting of the Technical Cooperation Group for the Population and Housing Censuses in South East Europe Vienna, March 2010 POST-ENUMERATION.
Quality at a Glance: Documentation of Quality Indicators at Statistics Austria European Conference on Quality in Official Statistics Rome, 8-11 July 2008.
Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.
National Bureau of Statistics of the Republic of Moldova 1 High Level Seminar for Eastern Europe, Caucasus and Central Asia Countries (EECCA) on 'Quality.
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
Metadata models to support the statistical cycle: IMDB
Implementation of Quality indicators for administrative data
Towards more flexibility in responding to users’ needs
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Training Course on Integrated Management System for Regulatory Body
Exchanging Reference Metadata using SDMX
4.1. Data Quality 1.
Guidelines on the use of estimation methods for the integration of administrative sources DIME/ITDG meeting 2018/02/22.
A Unifying View on Instance Selection
Measuring Data Quality and Compilation of Metadata
Emilio Di Meglio and Emanuela Di Falco (EUROSTAT)
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
Energy Statistics Compilers Manual
Quality Reporting in CBS
2.7 Annex 3 – Quality reports
Joint UNECE/Eurostat/OECD
INFORMATION SEMINAR Interreg V-A Latvia-Lithuania programme
Presentation transcript:

Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing

Contents Introduction Data Generating Processes Data Quality for Integration Based Production Assessing Quality for Integration Based Data Conclusions

Introduction – Aspects of Quality Quality is discussed from two different points of view  The Processing View What methods can be used in production of statistics ?  Specific statistical techniques for specific statistics Development of models of best practice or standards  The Reporting View How should Quality reports look like?

Introduction – Reporting View Numerous formats for Quality Reports  SDSS, DQAF, Fed Stats, StaCan,…. Logic of the proposals according to so called hyperdimensions – For example ESS: Institutional Arrangements Core Statistical Processes Dimensions for Statistical Output – Inside the hyperdimensions so called quality dimensions Relevance, Accuracy, Timeliness, Accessibility,……

Introduction – Reporting View Not so much agreement about the dimensions Possible Reason: Different methods / levels of Conceptualization – Concepts of mental entities e.g. quality dimensions in DQAF – Concepts as meaning of general terms e.g. quality elements in DQAF – Concepts as units of knowledge e.g. quality indicators of DQAF – Concepts as abstracts of kinds, attributes or properties measureable quantities like sampling error, …

Introduction – Reporting View Stronger matching of the processing and the reporting view seems necessary – Starting point can be attributes and properties of statistical processes necessary for assessing quality From basic quality concepts we build higher level elements by aggregation Prerequisite for definition of necessary basic quality concepts: – Empirical analysis of different production processes Final result is a User Oriented Quality Certificate

Data Generating Processes We can distinguish two broad classes of data generating processes – The survey based data generating process – The integration based data generating process

Data Generating Processes – Survey based Most considerations about reporting quality start from the traditional survey process – Characteristics of the traditional survey process One well defined target population (e.g. persons) A rather homogeneous method for data collection (e.g. questionnaire) A more or less linear sequence of processing steps (e.g. data cleaning, data editing, data imputation, output) Final Output is one Output File

Data Generating Processes – Integration based Many Statistics do not follow such a linear production scheme – Examples: Indices, numerous balance sheets, National Accounts, …. Common characteristic: Data are produced from many different sources Let us call such processes as integration based processes Data produced in such way are called integration based data

Data Generating Processes – Integration based – Characteristics of integration based data processing Population: – The underlying population may be split into segments » Example: Expenditures for education: government, private enterprises, households – Many times more than one population is involved, possibly also one population at different times » Example: calculation of indices

Data Generating Processes – Integration based – Characteristics of integration based data processing Data collection: – Data collection is different for different segments and populations – Many times the collected data are the output of already existing data products Main processing activities are alignment procedures making the different sources comparable Output may be a set of organized Data Files

Data Generating Processes – Workflow View Workflow for Survey Process

Data Generating Processes – Workflow View Workflow for Integration Based Process

Data Quality for Integration Based Production Two important aspects of data quality – Content quality Are the measured “concepts” really the target “concepts” ? – Production quality Are the used methods sound?

Data Quality for Integration Based Production – Content Quality Main reasons for lack of content quality – Slight difference in the measurements of the variables (“concepts” ) in case of reuse of already existing data – Example: » Transport of goods on Austrian rails » Transport of goods according to data from railway authorities (taking not into account that transport may use partly German rails) – Slight differences in the definition of the segments in the underlying population

Data Quality for Integration Based Production – Content Quality Conclusion: Using data already collected for other purposes gives often only proxy variables for the intended variables Question: Is this in coincidence with your mental concept of the term “Non-Sampling Error”? Manuals of international organizations are many times rather vague with respect to such problems

Data Quality for Integration Based Production – Content Quality Possible Strategies for Solution – Statistical Models for aligning the concepts – More detailed description of the concepts by using additional variables characterizing the differences as formal properties of the data – More detailed description of the underlying populations by using additional variables characterizing the differences

Data Quality for Integration Based Production – Processing Quality Elements of processing quality – Quality of methods used for the different components of the integration based statistic This implies that we do not have one method of collection, one editing, one imputation,… but many activities of that kind – Quality of methods used in the integration process Alignment of variables in order to overcome differences in concepts Standard activities like plausibility, editing, imputation necessary for the integration activities

Assessing Quality for Integration Based Data If we know the quality of all the components used in the integration process we have to think about transmission of quality in the integration steps Starting point should be an “Authentic Data System” – All data used in the integration process – Quality information about the different data sets of the system

Assessing Quality for Integration Based Data Distinguish two types of quality transmission – Quality compilation Methods for representing quality of the overall product – Quality calculations Algorithms for assessing quality In both cases we need – Methods for assessing quality – Models of best practice / standards

Assessing Quality for Integration Based Data – Quality Compilations In some cases the best we can do is better representation of the quality dimensions of the used components – Distribution of quality indicators – Concentration of quality indicators

Assessing Quality for Integration Based Data – Quality Compilations – Example: Coverage for integration based data Structure of integrated sources together with coverage information

Assessing Quality for Integration Based Data – Quality Compilations – Coverage distribution

Assessing Quality for Integration Based Data – Quality Compilations – Coverage concentration with respect to target concept

Assessing Quality for Integration Based Data – Quality Calculations Methods will be in most cases not formulas but advanced statistical procedures for different quality dimensions – Examples: Measurement of accuracy using variances, standard errors or coefficient of variation – Could be done by using bootstrap (e.g. applied for indices by NSO-GB) Simulation techniques Sensitivity analysis (“robustness”)

Conclusions Assessing quality of integration based statistics needs – Clear separation of content based quality and processing based quality – Better documentation / representation of complex production processes, Usage of Workflow Models – Documentation of the authentic data file – Definition of best practice / standards for integration processes – Algorithms for calculation quality dimensions – Methods for representation of quality indicators