Download presentation
Presentation is loading. Please wait.
1
Peng Zhang Jinnan Liu Mei-ting Chiang Yin Liu
Missing Data Peng Zhang Jinnan Liu Mei-ting Chiang Yin Liu 12/29/2018
2
Outline Introduction Exercise 1 Exercise 2 Exercise 3 Conclusion
12/29/2018
3
Introduction Objectives Distinguish non-response mechanisms
Examine methods used to deal with non-response -> Data Background 12/29/2018
4
Variable name Description
Health Determinants AGE non-negative integer 1~15, No missing, ordinal INCOME non-negative integer 1~11, 99='NOT STATED' i.e. missing ordinal DEPRESSION probability of depression, non-negative probability, 2 decimal points 0~1, No missing CHRONIC number of chronic conditions , non-negative integer 0~20, No Missing, continuous VISITS number of doctor visits, non-negative integer, possibility >10, No missing, continuous BodyMass Body Mass Index (BMI) SEX Binary * Male (1), Female (2) SOMKING smoking status, non-negative integer 1~6, 99='NOT STATED' i.e. missing, ordinal Health Status HINDEX1 the self-assessment to the quality of health, valued as integers from 1 to 5, ordinal, as 1 is good and 5 is bad HINDEX2 the health-utility-index, valued as two decimals from 0 to 1, continuous, as 1 is prefect and 0 is poor. 12/29/2018
5
Exercise One Assessing the nature of response
mechanism MCAR (Missing Completely at Random) MAR (Missing at Random) NMAR (Not Missing at Random) 12/29/2018
6
Analysis of Maximum Likelihood Estimates
Assessing response mechanism *SAS OUTPUT* Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 Intercept Intercept <.0001 Intercept <.0001 age <.0001 sex income <.0001 bodymass smoking <.0001 depression chronic <.0001 visits <.0001 12/29/2018
7
Result of Assessing It’s NOT MCAR but MAR
All the following imputation will base on this response mechanism. 12/29/2018
8
Exercise Two Deciding on the method to deal with the
missing data out of the popular methods: Mean Regression Multiple Imputation EM Algorithm Nearest Neighbour 12/29/2018
9
Conclusion of imputation
->Impute missing value using Regression Comparing with the results for the 5 methods, we conclude the Regression Imputation is most efficient in our case. 12/29/2018
10
Exercise Three Analysis Comparison The linear mixed model
The log-linear regression model Comparison Hindex1 & Hindex2 12/29/2018
11
Linear Mixed Model Figure 1: The histogram of hindex2 12/29/2018
12
Figure 2 : The relationship between index2
and income in each age group 12/29/2018
13
Figure 3 : The relationship b/w index1 and income in each age group
12/29/2018
14
Linear Mixed Model Fit the linear mixed model:
Fixed effects: hindex2log ~ income +depression + chronic + visits Value Std.Error DF t-value p-value (Intercept) <.0001 income <.0001 depression <.0001 chronic <.0001 visits <.0001 12/29/2018
15
Log-linear Regression Model
Fit the log-linear model: Coefficients: Value Std. Error t value (Intercept) age income smoking chronic visits 12/29/2018
16
Association between Hindex1 & Hindex2
Figure 4 : The relationship b/w index1 and index2 Coefficients: Value Std. Error t value (Intercept) hindex2log 12/29/2018
17
Figure 5: The relationship b/w index1 and index2 in
each age group 12/29/2018
18
Conclusion Since index2, “the health utility index” is more subject, useful, and appropriate index to measure the health status comparing to index1, the self-assessment answer. It will reveal more information, while index1 seems all close to 1 or 2 which means despite their age, income level, people tends to overestimate their health status. Age still plays the most important role about people's health status 12/29/2018
19
Thank you! Statistical Society of Canada for Providing the Data
Prof. Peggy Ng for Financial Support Prof. Peter Song for Providing Books on EM Algorithm Mr. BaiFang Xing for Helpful Discussions 12/29/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.