Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.

Slides:



Advertisements
Similar presentations
Multiple Regression Models: Interactions and Indicator Variables
Advertisements

1 QOL in oncology clinical trials: Now that we have the data what do we do?
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Differences Between Population Averages. Testing the Difference Is there a difference between two populations? Null Hypothesis: H 0 or Alternate Hypothesis:
Simple Repeated measures Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression Analysis in Trials: Baseline Variables Peter T. Donnan Professor of Epidemiology and Biostatistics.
SPSS Series 3: Repeated Measures ANOVA and MANOVA
WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.

How to deal with missing data: INTRODUCTION
Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Professor of Epidemiology and Biostatistics
Repeated measures: Approaches to Analysis Peter T. Donnan Professor of Epidemiology and Biostatistics.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
One-Way Manova For an expository presentation of multivariate analysis of variance (MANOVA). See the following paper, which addresses several questions:
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Moderation & Mediation
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 4: Regression Models and Multivariate Analyses.
Methods Inverse probability weighting –Can you predict probability of response? –Difficulties if more than one missing outcome or covariate Joint model.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Assessing Survival: Cox Proportional Hazards Model
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
Multiple Regression Lab Chapter Topics Multiple Linear Regression Effects Levels of Measurement Dummy Variables 2.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Simple Repeated measures Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Entering Multidimensional Space: Multiple Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 Updates on Regulatory Requirements for Missing Data Ferran Torres, MD, PhD Hospital Clinic Barcelona Universitat Autònoma de Barcelona.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Linear Discriminant Analysis and Logistic Regression.
Logistic Regression Analysis Gerrit Rooks
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Selecting Cases PowerPoint Prepared by Alfred.
Tutorial I: Missing Value Analysis
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © Analysis of Covariance (Regression Approach) by Julia Hartman Next.
ANCOVA.
Repeated measures: Approaches to Analysis
Missing data: Why you should care about it and what to do about it
BINARY LOGISTIC REGRESSION
Regression Analysis in Trials
Applied Biostatistics: Lecture 2
The Centre for Longitudinal Studies Missing Data Strategy
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Imputation Using Stata
How to handle missing data values
Does cognitive ability in childhood predict fertility
Working with missing Data
Introduction to Logistic Regression
Clinical prediction models
Imputation Strategies When a Continuous Outcome is to be Dichotomized for Responder Analysis: A Simulation Study Lysbeth Floden, PhD1 Melanie Bell, PhD2.
Presentation transcript:

Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics

Objectives How to impute missing values in SPSS, specifically MI How to impute missing values in SPSS, specifically MI How to implement analyses with multiple imputed values How to implement analyses with multiple imputed values Interpretation of the output Interpretation of the output Practical tips Practical tips

Example data From trial of pedometers+advice vs advice vs controls in sedentary elderly women Follow-up at 3 and 6 mnths Main outcome measure of activity from accelerometer counts 210 randomised / 170 at 3 months

Example data – Pedometer trial Read in data ‘SPSS Study databse.sav’ Main outcome is: 3 mnth activity – AccelVM2 Baseline activity – AccelVM1a Trial arm represented by two dummy variables:Grp1 = Pedom. Vs. control Grp2 = Advice vs. control

Main analysis – Pedometer trial Regression on 3 months activity adjusting for baseline activity and two dummy variables representing trial arm contrasts

Main analysis – Pedometer trial Note that n =170 with 40 missing in complete case analysis and so potential for bias

Missing at Random (MAR) Prob (Missing) is independent of: Prob (Missing) is independent of: 1) unobserved data but 2) dependent on observed data Essentially observed data is a random sample of full data in each stratum Essentially observed data is a random sample of full data in each stratum MAR is weaker version of MCAR assumption MAR is weaker version of MCAR assumption If MAR is assumed, many methods possible to impute data using observed data. If MAR is assumed, many methods possible to impute data using observed data.

Completers (n =172) Dropped out at 3 months (n = 32) Chi-squared or t- test p-value Age Mean (SD)77.1 (5.0)78.5 (5.6) Accelerometer VM Mean (SD) (47991) (50444) Limb Function Mean (SD)8.69 (2.25)7.41 (2.86) NHS Costs previous 3 months Mean (SD) £ (306.74)£ ( ) Pedometer Group N (%)58 (85.3%)10 (14.7%) BCI Group N (%)52 (77.6%)15 (22.4%) Control Group N (%)62 (92.5%)5 (7.5%) Stairs difficult Yes48 (76.2%)15 (23.8%) No124 (87.9%17 (12.1%) Comparison of completers at 3 months and drop-outs

Execution of MI in SPSS So assuming MAR we can use the available data to predict missing values in SPSS: Analyze Multiple Imputation Impute Missing Data Values

Execution of MI in SPSS Enter ALL variables you think associated with missingness Note default imputation number = 5 Create new dataset to store results Note icon indicating procedures that allow MI analysis

Execution of MI in SPSS Automatic method lets SPSS chose Custom gives more flexibility Can include all 2-way interactions Linear Regression model prediction

Execution of MI in SPSS List of variables chosen Define Each variable for imputation or predictor or BOTH N.b. Recommend including the OUTCOME as both predictor and outcome

Output of MI in SPSS Note main interest in outcome VM2 but other factors with missing values also imputed

Step 2 - Using Imputed datasets in analysis Note new dataset has IMPUTATION number as first column and contains in order the original dataset (n = 210), IMPUTATION = 0 and concatenated below it a further 5 new datasets (each n = 210) but now with imputed values, IMPUTATION = 1 to 5 Most analyses can now be implemented if the fossil shell spiral symbol is present

Repeat Main analysis – Need Pooled Results Procedure exactly same as before SPSS will do the pooled analysis if the icon (above) is present in the drop-down menu is present in the drop-down menu

Pooled Analysis in SPSS Results presented for the original data and for each imputed dataset separately

Results of pooled analysis from 5 imputed datasets ModelBSEtSig.Fraction missing Pooled Constant AccelVM1a Pedometer Group Advice only Larger effect sizes in both groups Greater power gives more significance

Interpretation Compare pooled results with the original as a form of sensitivity analysis If results similar suggests the original results fairly robust Consider whether MAR is reasonable assumption Consider whether you have included all factors (including the outcome) related to the missingness in the imputation model as a crucial assumption

Summary SPSS now includes Multiple imputation in its armoury SPSS now includes Multiple imputation in its armoury Consider assumptions of MI Consider assumptions of MI Compare results under different assumption to assess robustness of results Compare results under different assumption to assess robustness of results If MAR assumption o.k. then MI provides results that are less biased than complete case analysis If MAR assumption o.k. then MI provides results that are less biased than complete case analysis