Www.stat.gov.lt Oslo, 24–26 September 2012 Work Session on Statistical Data Editing APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT.

Slides:



Advertisements
Similar presentations
Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch.
Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,
Innovation Surveys: Advice from the Oslo Manual National training workshop Amman, Jordan October 2010.
Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
CHAPTER 1 INTRODUCTION Prem Mann, Introductory Statistics, 8/E Copyright © 2013 John Wiley & Sons. All rights reserved.
Evaluating Alternative Representations of the Choice Set In Models of Labour Supply Rolf Aaberge, Ugo Colombino and Tom Wennemo Workshop on Discrete Choice.
Chapter 10 Simple Regression.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Chapter 13 Forecasting.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Data Editing United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
Eurostat Statistical Data Editing and Imputation.
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
The Edit Anders Norberg, Statistics Sweden (SCB) Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Antonio Bernardi - Fulvia Cerroni - Viviana De Giorgi (Istat) An application to the Tax Authority Source (Sector Studies) Session: Administrative data.
Multivariate Statistical Data Analysis with Its Applications
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
1. Prediction: to tell something before it happens 2. Hypothesis: a possible answer to a question based on gathered information.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Crop area estimates with area frames in the presence of measurement errors Elisabetta Carfagna University of Bologna Department.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
QBM117 Business Statistics Estimating the population mean , when the population variance  2, is known.
THE MAIN INNOVATIONS OF DATA EDITING AND IMPUTATION FOR THE 2010 ITALIAN AGRICULTURAL CENSUS G. Bianchi, R. M. Lipsi, P. Francescangeli, G. Ruocco, A.
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Quantitative Methods in Finance (FINA 514/614) Assoc.Prof.Dr. Salih KATIRCIOGLU (Ph.D in Economic Development and International Economics)
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic.
Trends in Inequality of Educational Opportunity in the Netherlands : The Effect of Missing Data Maarten L. Buis & Harry B.G. Ganzeboom Department.
for statistics based on multiple sources
Joint OECD / European Commission workshop on international development in business and consumer tendency surveys Nov Task force on improvement.
Statistics Workshop Tutorial 5 Sampling Distribution The Central Limit Theorem.
Section 5.4 Sampling Distributions and the Central Limit Theorem Larson/Farber 4th ed.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Marcello D’Orazio UNECE - Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 Statistical.
Creating Synthetic Microdata from Official Statistics: Random Number Generation in Consideration of Anscombe's Quartet Kiyomi Shirakawa Hitotsubashi University.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
Chapter 6: Analyzing and Interpreting Quantitative Data
Evaluating the Quality of Editing and Imputation: the Simulation Approach M. Di Zio, U. Guarnera, O. Luzi, A. Manzari ISTAT – Italian Statistical Institute.
PC-AXIS EXPERIENCE Prepared by: Chief Specialists of Dissemination of Statistical information Division Giedrė Jerinovičiūtė Ana Gricevič.
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Tutorial I: Missing Value Analysis
Euro-Indicators Working Group MEASURING OUTPUT GAP IN LITHUANIA 1997–2007 Jurga Rukšėnaitė Chief Specialist, Methodology and.
ПРОБЛЕМЫ ПЕРЕСЧЁТА КВЕД 2005 – КВЕД 2010 Bronislava Kaminskienė.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
Central Limit Theorem Section 5-5
Hypothesis Testing with z Tests
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Chapter 11: Simple Linear Regression
Session D12: Multisource statistics New sources: new modelling approaches Author: Gras Fabrice, Eurostat, unit B1, Methodology and corporate architecture.
Sampling Distribution
Sampling Distribution
The European Statistical Training Programme (ESTP)
MEASUREMENT OF THE QUALITY OF STATISTICS
Virtual University of Pakistan
ANALYSIS OF POSSIBILITY TO USE TAX AUTHORITY DATA IN STS. RESULTS
Chapter 13: Item nonresponse
New and Emerging Methods
Introductory Statistics
Presentation transcript:

Oslo, 24–26 September 2012 Work Session on Statistical Data Editing APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA Jurga Rukšėnaitė Chief specialist Methodology and Quality division

TOPICS I.Methods of detection of errors and outliers II.Methods of data imputation III.The use of the developed SAS Macro at Statistics Lithuania (practical example) Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Oslo, 24–26 September 2012 Work Session on Statistical Data Editing I Methods of detection of errors and outliers For quantitative variables 1.Universal method 2.Interval method 3.Standard deviation rule 4.Testing of hypothesis

Oslo, 24–26 September 2012 Work Session on Statistical Data Editing I.1 Universal method

I.2 Interval method Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Oslo, 24–26 September 2012 Work Session on Statistical Data Editing I.3 Standard deviation rule

I.4 Testing of hypothesis Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

II Imputation 1.Imputation using distributions 2.Imputation using donors 3.Imputation using models Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

II.1 Imputation using distributions (1) Figure 1. Statistical models Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

II.1 Imputation using distributions (2) Oslo, 24–26 September 2012 Work Session on Statistical Data Editing The study variableChosen distribution Two different valuesBernoulli distribution Three to eight different valuesDiscrete random variable More than eight different values Continuous distributions (uniform, normal, lognormal, and exponential)

II.2 Imputation using donors  Historical (cold-deck) imputation replaces the missing value of an item with a constant value from an external source (previous survey).  Hot-deck imputation replaces missing data with comparable data from the same data set.  Nearest neighbor imputation replaces missing data with the donor value. The right donor is found by calculating the distance function from a set of auxiliary information. Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

II.3 Imputation using models Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Practical examples Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Example 1. Detection of outliers  Quarterly statistical survey on short-term statistics on service enterprises  The study variable is income in each quarter (PAJ3),  The auxiliary variable is the number of employees. The output Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Example 2. Verification of imputation for quantitative data The verification table shows the percentage difference between the predicted and the real value Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Example 3. Verification of imputation for qualitative data Simulated data was used. The study variable y4 has two possible values: 1 and 2. Three auxiliary variables: x1, x2, and x3. Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Conclusions and future work  SAS Macro program consists of five parts: detection of errors, detection of outliers, imputation using the nearest neighbor method, imputation using models, and imputation using distributions.  Several trainings were organized for the employees of Statistics Lithuania. 37 employees attended the training of this program. Half of them is using or going to use the SAS Macro in their work.  The program was tested using real data. The results showed that time spent for data editing/imputation was reduced.  The program not only gives a new data set with imputed values but also calculates several statistics (sample mean before and after imputation, standard deviation before and after imputation), which can be used to assess the quality of imputation.  The latest improvement to this program enables the identification of strata variable. This improvement allows finding errors or outliers and imputing missing values separately in each stratum, group or domain.  The methods programed now are the simplest one; therefore, later, more complicated methods for the imputation and detection of outliers will be added to the program. Oslo, 24–26 September 2012 Work Session on Statistical Data Editing

Oslo, 24–26 September 2012 Work Session on Statistical Data Editing Questions?

References 1.Chen J. and Shao J. Nearest neighbor imputation for survey data. Journal of Official Statistics, 16: 113–131, Čekanavičius V., Murauskas G. Statistika ir jos taikymai // 1 dalis. TEV, Vilnius, Čekanavičius V., Murauskas G. Statistika ir jos taikymai // 2 dalis. TEV, Vilnius, Granquist L. Macro-editing. A review of some methods for rationalizing the editing of survey data. 5.McFadden, D. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, ed. P. Zarembka, New York: Academic Press: Krapavickaitė, D., Plikusas, A. Imčių teorijos pagrindai. Vilnius: Technika, Little R.J.A. and Rubin D. B. Statistical analysis with missing data. Wiley, Luzi O., et al. Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys. EDIMBUS-RPM, Nordholt E. S. Imputation: Methods, Simulation Experiments and Practical Examples. International Statistical Review, 66: 157–180, Statistical data editing. Methods and techniques. Vol. 1, United Nations, Statistical data editing. Impact on data quality. Vol. 3, United Nations, Oslo, 24–26 September 2012 Work Session on Statistical Data Editing