Theme (i): New and emerging methods

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
1 Editing Administrative Data and Combined Data Sources Introduction.
CAIR: What next? Richard Parncutt, 10 April 2010.
Before doing comparative research with SEM … Prof. Jarosław Górniak Institute of Sociology Jagiellonian University Krakow.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
Eurostat Statistical Data Editing and Imputation.
Use of Administrative Data in Statistics Canada’s Annual Survey of Manufactures Steve Matthews and Wesley Yung May 16, 2004 The United Nations Statistical.
1 MF-852 Financial Econometrics Lecture 10 Serial Correlation and Heteroscedasticity Roy J. Epstein Fall 2003.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Jeroen Pannekoek - Statistics Netherlands Work Session on Statistical Data Editing Oslo, Norway, 24 September 2012 Topic (I) Selective and macro editing.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Weighting and estimation methods: description in the Memobust handbook Loredana di Consiglio, Fabrizio Solari 2013 European Establishment Statistics Workshop.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
Topic (i): Selective editing / macro editing Discussants Orietta Luzi - Italian National Statistical Institute Rudi Seljak - Statistical Office of Slovenia.
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
New and Emerging Methods UN/ECE Work Session on Statistical Data Editing Vienna April 21-23, 2008.
Generic Statistical Data Editing Models (GSDEMs) Workshop on the Modernisation of Official Statistics The Hague, 24 November 2015.
Q2010 Special session 34 Data quality and inference under register information Discussion by Carl-Erik Särndal.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Discussion Discussants: Rudi.
Ljubljana, 11 Mai 2011UNECE Work session on SDE Topic (vii) New and emerging methods 1 Topic (vii): New and emerging methods Introduction Session organizers:
COMBINING SURVEY AND ADMINISTRATIVE DATA IN THE ITALIAN EU-SILC EXPERIENCE: POSITIVE AND CRITICAL ASPECTS National Institute of Statistics - Italy Claudio.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
1 1 Statistical registers by restricted neighbor imputation – An application to the Norwegian Agriculture Survey Nina Hagesæther and Li-Chun Zhang Statistics.
The Research Process Formulate a research hypothesis (involves a lit review) Design a study Conduct the study (i.e., collect data) Analyze the data (using.
Methods for Data-Integration
Chapter 1 Introduction and Data Collection
PSY 626: Bayesian Statistics for Psychological Science
Machine Learning with Spark MLlib
Theme (v): Managing change
Generic Statistical Data Editing Models (GSDEMs)
Chapter 7. Classification and Prediction
Constructing hypotheses & research design
Theme (ii): New Data Sources and Census
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
CJT 765: Structural Equation Modeling
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing April 2017 The Hague,
Session D12: Multisource statistics New sources: new modelling approaches Author: Gras Fabrice, Eurostat, unit B1, Methodology and corporate architecture.
CJT 765: Structural Equation Modeling
Research and Grant Writing
پرسشنامه کارگاه.
How to handle missing data values
Further Inference in the Multiple Regression Model
PSY 626: Bayesian Statistics for Psychological Science
Estimation methods for the integration of administrative sources
CHAPTER 29: Multiple Regression*
Estimation methods for the integration of administrative sources
Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Italian situation in the following areas:
Workshop on Residential Property Price Indices
Reading Property Data Analysis – A Primer, Ch.9
Psych 231: Research Methods in Psychology
Sampling and Power Slides by Jishnu Das.
Business Statistics: A First Course (3rd Edition)
Data processing German foreign trade statistics
The role of metadata in census data dissemination
A bootstrap method for estimators based on combined administrative and survey data Sander Scholtus (Statistics Netherlands) NTTS Conference 13 March 2019.
A modest attempt at measuring and communicating about quality
UNECE Work Session on Statistical Data Editing, Paris, 2014
A handbook on validation methodology. Metrics.
New and Emerging Methods
« Survey on PRODCOM data collection in other MS
Presentation transcript:

Theme (i): New and emerging methods UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing 24-26 April 2017 The Hague, Netherlands New perspectives for data editing in the context of new data sources and data integration Theme (i): New and emerging methods

New and emerging methods This topic aims to bring together contributions that include innovative ideas and applications related to various aspects of editing and imputation. Of special interest are methods for the detection and amendment of errors in alternative or new data sources, including administrative and big data, either on their own or in combination with survey data. They should also highlight the expected impact that new methods might have on the statistical agency, including how they contribute to standardizing concepts, terminology, methods, data structures and the quality of its data products.

Seven papers 1/2 Correcting for misclassification under edit restrictions in combined survey- register data using Multiple Imputation Latent Class modelling (MILC) – Netherlands. Multi source data: how to include a covariate when the LC model is constructed by using the ‘three-step’ approach. Simplifying constraints in data editing – Netherlands. How algorithms from Operation Research and Artificial Intelligence can be used to simplify real-life edit sets. An automatic procedure for selecting weights in kNN imputation – Austria. Selection of proper weights for the distance variables when applying k-Nearest- Neighbour imputation for different kinds of variables. Imputation methods satisfying constraints – Netherlands. An overview of imputation methods that satisfy edit restrictions and/or constraints due to known or previously estimated totals.

Seven papers 2/2 Evaluating the quality of business survey data before and after automatic editing – Netherlands. Evaluating the measurement quality of automatic editing methods by modelling the residual measurement errors in the data. Computational estimates of data editing related variance – Netherlands. The process of data editing is considered to be a part of the estimator. The purpose is to estimate the total variance of the estimator that include the stochastic contribution due to the editing steps. A comparison of Kokic and Bell and conditional bias methods for outlier treatment – France. Comparing two outlier treatment methods in a simulation study on real data with a complicated sampling design.

Enjoy the presentations!

Discussion 1/3 MILC method by ‘three-step’ approach: is covariate variable considered free-error or not? Could more covariates be included? Simplifying constraints in data editing: how efficient is it in terms of both costs and time? Despite the constraint simplification such algorithms are not often applied in the field of data editing. Why? Selecting weights in kNN imputation: the Lasso weights are set equal to 1 for all variables with non-zero regression coefficients. Why not differentiate by using the respective coefficients themselves as weights?

Discussion 2/3 An overview of imputation methods: an ‘‘ideal’’ imputation method has not been found, nor does it exist. As always, an imputation method can be good for something, but bad for others. Should the choice to impute or to weight be treated just as a practical question? Evaluating the quality of business survey: consider the following automatic editing procedure: “in case of potential errors, set the changeable value(s) to that of the most reliable unchangeable input source” – how would this method look under the same evaluation approach? Generally, how to distinguish numerical consistency from truthfulness?

Discussion 3/3 Data editing related variance: (for the audience) are there experiences with estimating imputation variance at NSI’s? What methods are used? Is the proposed method an option? Outlier treatment: both methods give similar results. Which one is going to be applied? Are there theoretical or practical reasons to choose one over the other?