WGCI - Seminar1 Sequential Regression – A Method for Multipe Imputations of Missing Data Seminar of the Working Group on Composite Indices 27 March 2002.

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

The Fall Messier Marathon Guide
Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
SKELETAL QUIZ 3.
PDAs Accept Context-Free Languages
/ /17 32/ / /
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
EQUS Conference - Brussels, June 16, 2011 Ambros Uchtenhagen, Michael Schaub Minimum Quality Standards in the field of Drug Demand Reduction Parallel Session.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
ASCII stands for American Standard Code for Information Interchange
CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter,
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Sampling in Marketing Research
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
© 2010 Concept Systems, Inc.1 Concept Mapping Methodology: An Example.
Connecticut Mastery Test (CMT) and the Connecticut Academic Achievement Test (CAPT) Spring 2013 Presented to the Guilford Board of Education September.
MM4A6c: Apply the law of sines and the law of cosines.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Statistics Review – Part I
Progressive Aerobic Cardiovascular Endurance Run
Visual Highway Data Select a highway below... NORTH SOUTH Salisbury Southern Maryland Eastern Shore.
Name of presenter(s) or subtitle Canadian Netizens February 2004.
Making Landmark or Friendly Numbers. Category 1 Designed to be one away from a landmark or friendly number.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Doc.: IEEE /0333r2 Submission July 2014 TGaj Editor Report for CC12 Jiamin Chen, HuaweiSlide 1 Date: Author:
9. Two Functions of Two Random Variables
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
What impact does the address have on the tribe?
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

WGCI - Seminar1 Sequential Regression – A Method for Multipe Imputations of Missing Data Seminar of the Working Group on Composite Indices 27 March 2002 Tanja Srebotnjak, United Nations Statistics Division, 2002

WGCI - Seminar2 Procedure: A multivariate technique for multiply imputing missing data Uses a sequence of regression models Developed by T. E. Raghunathan, J. M. Lepkowski, Peter Solenberger and John Van Hoewyk, [University of Michigan]

WGCI - Seminar3 Procedure (contd.): Assume a dataset of dimension (n p) with item non-response/missingness Partition the dataset into n 1 variables with no missing obs, say X=(X 1,X 2,…,X n1 ) and (n- n 1 ) with missing values Y=(Y 1,Y 2,…,Y n-n1 ) Y is ordered by degree of missingness, from least to most

WGCI - Seminar4 Procedure (contd.): Then, the conditional distribution of Y 1, i=1,2,…,n-n 1, given the observed values is modeled as a regression model of Y i on X, e.g. E(Y 1 |X)=X + e Missing values are imputed using this model Once, Y 1 is imputed, it is used as a predictor for Y 2, i.e. X=(X 1, X 2, …, X n, Y 1 )

WGCI - Seminar5 Procedure (contd.): The algorithm continues cycling through this series of regression models (using updated predictor sets until X=(X 1,X 2,…,X n1,Y 1,…,Y n-n1 ) Now, a new round begins, using the full dataset as predictor for Y 1 again, thus updating the regression coefficients The algorithm is repeated until convergence in the regression coefficients is achieved, i.e. change below a specified margin

WGCI - Seminar6 Procedure (contd.): Finally, the missing values for each Y i are imputed using the corresponding converged regression model In order to yield multiple imputations, the complete algorithm is repeated m times, resulting in m completed datasets The m datasets are analyzed and the results combined to yield final parameter estimates

WGCI - Seminar7 Flexibility: Basically all model types can be fitted Stepwise regressions possible to ensure that only most important predictors enter the model Introduction of randomization in the imputation process through prior distribution on regression coefficients, and/or perturbation of imputed values No assumption of specific distribution for joint distribution necessary

WGCI - Seminar8 Computational realization: SAS callable IVEware (Imputation, Variance Estimation software) developed by Raghunathan, Solenberger, Van Hoewyk Allows for the modeling of linear, logistic, Poisson and generalized logit distributions Prior distributions on coefficients Stepwise regressions with specified R 2 Inclusion of interaction terms in model Restrictions on imputed variables

WGCI - Seminar9 Example: The status of Least Developed Country is based on assessment of several requirements, including data containing 4 key indicators. Missing values in these 4 key indicators render the assessment more difficult. Hence, imputation of missing values would be desirable. Objective: Compare estimated values from LDC dataset with imputations using a srmi approach.

WGCI - Seminar10 Application to LDC dataset: LDC dataset consists of 4 variables: Child mortality rate per 1000, [ChldMort] Calory intake as % of requirements, 1995/1997 [CalInt] 1 st and 2 nd level gross enrollment ratio, 1995 [EnrRatio] Adult literacy rate in %, 1995 [LitRate]

WGCI - Seminar11 Application to LDC dataset (contd.): Distributions skewed and not normal, high intercorrelations >cor(ldcdata[,2:6],na.method="available") ChldMort CalInt EnrRatio AdtLitRate LDCInd ChldMort CalInt EnrRatio AdtLitRate LDCInd 1.00

WGCI - Seminar12 Application to LDC dataset (contd.): Literacy rate:Calory intake Child mortalityEnrollment ratio

WGCI - Seminar13 Application to LDC dataset (contd.): Missingness distribution: Variable Observed Imputed Double counted EnrRatio LitRate ChldMort CalInt

WGCI - Seminar14 Application to LDC dataset (contd.): Model specifications: All variables are continuous Bounds on imputed values derived from observed values (to guarantee sensible imputations) Min R 2 for inclusion of a variable in model 0.15 M=5 imputations Convergence diagnostics: Stabilization of coefficients and mean, variance of imputed variable

WGCI - Seminar15 Application to LDC dataset (contd.): Largest deviations observed between imputed and estimated values for ChldMort 1, although only 10 values were missing. Sqrt(deviations^2) ChldMort CalInt EnrRatio LitRate This is reflected also in the standard deviation of ChldMort Note: the range of ChldMort is [0,500], thus larger deviations are possible compared to the other variables.

WGCI - Seminar16 Application to LDC dataset (contd.): Variable ChldMort Obs Imputed Combined No Min Max Mean StdDev Variable EnrRatio Obs Imputed Combined No Min Max Mean StdDev

WGCI - Seminar17 Application to LDC dataset (contd.): Variable LitRate Obs Imputed Combined No Min Max Mean StdDev Variable CalInt Obs Imputed Combined No Min Max Mean StdDev

WGCI - Seminar18 Application to LDC dataset (contd.): Srmi provides a flexible method for multiply imputing missing data without requiring the assumption of a specific joint distribution of the data at hand. The srmi approach reflects intuitively the idea to use regression models for imputation purposes. However, it is more powerful than single regression through the full exploitation of the covariance structure.