Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.

Slides:



Advertisements
Similar presentations
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Unit 7 Section 6.1.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
1 BA 275 Quantitative Business Methods Residual Analysis Multiple Linear Regression Adjusted R-squared Prediction Dummy Variables Agenda.
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Chapter 12 Section 1 Inference for Linear Regression.
Things that I think are important Chapter 1 Bar graphs, histograms Outliers Mean, median, mode, quartiles of data Variance and standard deviation of.
STA291 Statistical Methods Lecture 16. Lecture 15 Review Assume that a school district has 10,000 6th graders. In this district, the average weight of.
Estimation in Sampling!? Chapter 7 – Statistical Problem Solving in Geography.
Econ 3790: Business and Economics Statistics
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Determination of Sample Size: A Review of Statistical Theory
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
ANOVA: Analysis of Variance.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Ledolter & Hogg: Applied Statistics Section 6.2: Other Inferences in One-Factor Experiments (ANOVA, continued) 1.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Recapitulation! Statistics 515. What Have We Covered? Elements Variables and Populations Parameters Samples Sample Statistics Population Distributions.
Tutorial I: Missing Value Analysis
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Pairwise comparisons: Confidence intervals Multiple comparisons Marina Bogomolov and Gili Baumer.
Inference about the slope parameter and correlation
ESTIMATION.
Chapter 23 Comparing Means.
STATISTICAL INFERENCE
Multiple Imputation using SOLAS for Missing Data Analysis
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Linear Mixed Models in JMP Pro
Multiple Imputation.
Quantitative Methods Simple Regression.
BIVARIATE REGRESSION AND CORRELATION
How to handle missing data values
BA 275 Quantitative Business Methods
Frank Miller AstraZeneca, Södertälje, Sweden
The European Statistical Training Programme (ESTP)
Chapter 14 Inference for Regression
Simple Linear Regression
Chapter 13 Additional Topics in Regression Analysis
Statistics Review (It’s not so scary).
Chapter 13: Item nonresponse
Presentation transcript:

Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September 2012

EU-SILC data Data on income and living conditions Data on household members and selected individuals Among the large number of variables we selected: VARIABLE TO BE IMPUTED PY010G - Gross annual income Completely at random deleted about 11% data EXPLANATORY VARIABLES PE040 - Level of education attained PL060 - Number of hours usually worked per week AGE - Age of person

Analysis PY010G PY010G is very asymmetrical Analysis according PE040 Because PE040 is categorical 5 equal models

Further analysis PY010G For each level of education achieved Analysis according to AGE and PL060 For 5th education level

Model for PY010G 𝑃𝑌010𝐺= 𝛽 1 + 𝛽 2 ∗𝑃𝐿060+ 𝛽 3 ∗𝐴𝐺𝐸+𝜀, 𝜀 ~ 𝑁 0, 𝜎 2 𝑌=𝑋𝛽+𝜀 , 𝜀 ~ 𝑁 0, 𝜎 2 𝑃𝑌010𝐺= 𝛽 1 + 𝛽 2 ∗𝑃𝐿060+ 𝛽 3 ∗𝐴𝐺𝐸+𝜀, 𝜀 ~ 𝑁 0, 𝜎 2 Estimations: 𝛽 = 𝑋 𝑜𝑏𝑠 𝑇 𝑋 𝑜𝑏𝑠 −1 𝑋 𝑜𝑏𝑠 𝑇 𝑌 𝑜𝑏𝑠 𝑠 2 = ( 𝑌 𝑜𝑏𝑠 − 𝑋 𝑜𝑏𝑠 𝛽 ) 𝑇 ( 𝑌 𝑜𝑏𝑠 − 𝑋 𝑜𝑏𝑠 𝛽 ) /(𝑛 𝑜𝑏𝑠 −𝑘) Example for: PE040=5, AGE=40, PL060=40 Graphs of normal distribution with respect to the data (red) and regression model (green).

Bayes aproach Equal treatment for Parameters: DATA: 𝑌 (PY010G) and PARAMETERS: 𝛽, 𝜎 2 Parameters: are not fixed values, have their own probability distribution.

Simulations and Multiple imputation Simulations of parameters: first draw variance: 𝜎 2 | 𝑌 𝑜𝑏𝑠 , 𝑋 𝑜𝑏𝑠 ~ 𝑆𝑐𝑎𝑙𝑒𝑑-𝐼𝑛𝑣- 𝜒 2 𝑛 𝑜𝑏𝑠 −𝑘, 𝑠 2 , then draw coefficients: 𝛽 | 𝜎 2 , 𝑌 𝑜𝑏𝑠 , 𝑋 𝑜𝑏𝑠 ~ 𝑁 𝛽 , 𝑋 𝑜𝑏𝑠 𝑇 𝑋 𝑜𝑏𝑠 −1 𝜎 2 . Simulations of missing values (Multiple imputation) draw missing value: 𝑦 𝑚𝑖𝑠,𝑖 ~ 𝑁 𝑋 𝑚𝑖𝑠,𝑖 𝛽, 𝜎 2 , independently for each missing value ( 𝑖=1, 2, …, 𝑛 𝑚𝑖𝑠 ). 5 imputations almost 98% efficiency (Rubin`s formula for about 11% rate of missing information.)

Imputed values Example of 5 imputations for: PE040=5, AGE=40, PL060=40

Evaluation Comparison of the average gross annual income (Initial data: data before deleting.) Small relative errors Relatively narrow 95% confidence intervals Poorer results for model 6, because: only 58 units high variance from the linear regression (252862689)

Thank you for your attention ! Discussion Method is effective, if data are successfully described by the selected model. Mechanism of missing values is ignorable, if missing data are MAR and parameters of model and parameters of mechanism of missing values are divisible (parameters are independent). Imputed and explanatory variables have to be numerical. We tested the method progressively by using the SAS programme. The method is already included in the MCMC procedure in newer version (9.2 and 9.3) of the SAS. Thank you for your attention !