MODELING of SWIFT & Survey to Survey Imputation

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

CHOW TEST AND DUMMY VARIABLE GROUP TEST
EC220 - Introduction to econometrics (chapter 5)
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Instrumental Variables Estimation and Two Stage Least Square
Heteroskedasticity The Problem:
MAE 552 Heuristic Optimization
T T18-04 Linear Trend Forecast Purpose Allows the analyst to create and analyze the "Linear Trend" forecast. The MAD and MSE for the forecast.
SIMPLE LINEAR REGRESSION
T T18-05 Trend Adjusted Exponential Smoothing Forecast Purpose Allows the analyst to create and analyze the "Trend Adjusted Exponential Smoothing"
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Statistics 350 Lecture 27. Today Last Day: Start Chapter 9 ( )…please read 9.1 and 9.2 thoroughly Today: More Chapter 9…stepwise regression.
SIMPLE LINEAR REGRESSION
1 Relationships We have examined how to measure relationships between two categorical variables (chi-square) one categorical variable and one measurement.
Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Naive Extrapolation1. In this part of the course, we want to begin to explicitly model changes that depend not only on changes in a sample or sampling.
SIMPLE LINEAR REGRESSION
Linear Regression Inference
Hydrologic Modeling: Verification, Validation, Calibration, and Sensitivity Analysis Fritz R. Fiedler, P.E., Ph.D.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
The Determinants of Demand for Hybrid Cars Shad Ahmed Mark Baldwin Kelly Fogarty Michael Kendra.
Variation and Prediction Intervals
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Subjects Review Introduction to Statistical Learning Midterm: Thursday, October 15th :00-16:00 ADV2.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Tutorial I: Missing Value Analysis
Establishing Comparable Poverty Estimates in Serbia (and elsewhere…) Jill Luoto January 25, 2007 Western Balkans Poverty Analysis Course: World Bank.
Multiple Regression Scott Hudson January 24, 2011.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Estimating standard error using bootstrap
Bootstrap and Model Validation
Inference about the slope parameter and correlation
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
Deriving a reliable measure of household income – DWP
assignment 7 solutions ► office networks ► super staffing
How regression works The Right Questions about Statistics:
ENM 310 Design of Experiments and Regression Analysis
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
MATH 2311 Section 7.3.
Validation of Regression Models
The slope, explained variance, residuals
Advanced Analytics Using Enterprise Miner
Multiple Imputation Using Stata
Example 1 5. Use SPSS output ANOVAb Model Sum of Squares df
Prediction of new observations
The European Statistical Training Programme (ESTP)
Linear Model Selection and regularization
Correlation and Regression
Trip Generation II Meeghat Habibian Transportation Demand Analysis
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
MATH 2311 Section 7.3.
Poverty Maps for Sri Lanka
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Chapter 13: Item nonresponse
PRESENTATION OF MONTENEGRO
Introduction to Machine learning
Presentation transcript:

MODELING of SWIFT & Survey to Survey Imputation

SWIFT modeling Cross-validation to decide the optimal p-value Run the stepwise regression using the optimal p-value determined by the cross-validation Check the coefficients of the final model Simulate household expenditures using PovMap or MI Estimate poverty rates using the simulated expenditures and MI’s formula using “mi estimate” To check stability, conduct backward imputation analysis

Steps for SWIFT modeling Objective Program 1. Cross Validation Choose optimal p-value crossvalidation.do 2. Finalization of models Find a model with the optimal p-value Simulation and estimation.do or PovMap 3. Simulate household expenditure/income Simulate household expenditures based on the above estimation Or 4. Estimate poverty statistics Using the simulated expenditures or income, poverty rates are estimated using “mi estimate” mi estimate

Data preparation You need to have y0 and y1 y0: data for developing models Need both consumption and regression variables GLSS6 with lnrpcexp y1: data for estimating poverty rates using models Need only regression variables that have the same definitions as Y0 SWIFT data GLSS6 without lnrpcexp Both data need to have all variables used in cross-validation, simulation, and estimation stages

crossvaliation2.do Choose a location of the dataset properly If you want to change variable sets, you can change global macros for variable groups, such as location, etc.

What level do you want to do the modeling? If you want to create a model for only urban areas or only one region, you need to modify the program here. Note that you need to be consistent between cross-validation and simulation/estimation

Defining 10 folds Randomly define 10 folds (subsamples)

Loop Randomly define 10 folds (subsamples)

Result matrix A pe fold_ poor_ r2 mse pred_poor absdiff 0.01 1 0.348 0.426 0.382 0.302 0.046 2 0.316 0.427 0.243 0.248 0.067 3 0.225 0.434 0.245 0.310 0.085 4 0.106 0.442 0.228 0.233 0.126 5 0.238 0.439 0.274 0.281 0.043 6 0.344 0.460 0.273 0.249 0.095 7 0.358 0.271 0.307 0.131 8 0.154 0.452 0.264 0.244 0.090 9 0.456 0.279 0.318 0.010 10 0.304 0.436 0.252 0.052 0.02 0.386 0.321 0.027 0.437 0.240 0.294 0.022 0.466 0.250 0.293 0.068 0.450 0.221 0.236 0.129 0.445 0.270 0.303 0.065 0.392 0.286 0.284 0.155 0.462 0.278 0.098 0.467 0.299 0.009 0.232 0.300 0.004

Results of cross validation Average of absolute differences between actual and projected poverty rates (absdiff) Mean Squared Errors (mse) 2 percent is chosen since the massive increase in MSE after 2%

Simulation and estimation.do

Simulation and estimation.do

Simulation results -------------------------------------------------------------- Over | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ Actual | .2744967 .0458151 .1811929 .3678006 Simulation | .2817209 .0516824 .1675013 .3959405 Number of observation = 1016 Area = Region 8 Urban

Thank you! If you have any question, please let me know at nyoshida@worldbank.org