Peng Zhang Jinnan Liu Mei-ting Chiang Yin Liu

Slides:

Advertisements

Similar presentations

DEPARTMENT OF MATHEMATICS AND STATISTICS Handling Missing Data Tao Sun Lena Zhang Yaqing Chen Francisco Aguirre SSC Case Study 2002.

Advertisements

CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.

A Model to Evaluate Recreational Management Measures Objective I – Stock Assessment Analysis Create a model to distribute estimated landings (A + B1 fish)

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.

[Part 1] 1/15 Discrete Choice Modeling Econometric Methodology Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.

CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.

Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.

EPI 809/Spring Multiple Logistic Regression.

Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.

WLS for Categorical Data

Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.

Discrete Choice Modeling William Greene Stern School of Business New York University.

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

Simple Linear Regression

Logistic Regression for binary outcomes

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

Statistical Methods Statistical Methods Descriptive Inferential

Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.

Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.

Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.

GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.

© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.

Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.

Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.

Tutorial I: Missing Value Analysis

Nonparametric Statistics

A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.

Statistics and probability Dr. Khaled Ismael Almghari Phone No:

Stats Methods at IC Lecture 3: Regression.

Chi Square Test Dr. Asif Rehman.

An Introduction to Latent Curve Models

Nonparametric Statistics

Chapter 20 Linear and Multiple Regression

Multiple Regression Models

Running models and Communicating Statistics

Notes on Logistic Regression

William Greene Stern School of Business New York University

Introduction The two-sample z procedures of Chapter 10 allow us to compare the proportions of successes in two populations or for two treatments. What.

William Greene Stern School of Business New York University

B&A ; and REGRESSION - ANCOVA B&A ; and

Chapter 11: Simple Linear Regression

Linear Mixed Models in JMP Pro

Eastern Michigan University

Do Age, BMI, and History of Smoking play a role?

A Growth Curve Analysis Participant Baseline Characteristics

Maximum Likelihood & Missing data

Lauren Rodgers Supervisor: Prof JNS Matthews

Understanding Standards Event Higher Statistics Award

Introduction to Survey Data Analysis

Multiple Imputation Using Stata

Two Way ANOVAs Factorial Designs.

Nonparametric Statistics

ביצוע רגרסיה לוגיסטית. פרק ה-2

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

Presenter: Ting-Ting Chung July 11, 2017

Applied Epidemiologic Analysis - P8400 Fall 2002

The European Statistical Training Programme (ESTP)

Logistic Regression.

Introduction to Logistic Regression

A.M. CLARKE-CORNWELL1, P.A. COOK1 and M.H.GRANAT1

Financial Econometrics Fin. 505

Chapter 4: Missing data mechanisms

The European Statistical Training Programme (ESTP)

Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015

Introduction to Regression

Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.

Chapter 13: Item nonresponse

Modeling Ordinal Associations Bin Hu

Presentation transcript:

Peng Zhang Jinnan Liu Mei-ting Chiang Yin Liu Missing Data Peng Zhang Jinnan Liu Mei-ting Chiang Yin Liu 12/29/2018

Outline Introduction Exercise 1 Exercise 2 Exercise 3 Conclusion 12/29/2018

Introduction Objectives Distinguish non-response mechanisms Examine methods used to deal with non-response -> Data Background 12/29/2018

Variable name Description Health Determinants AGE non-negative integer 1~15, No missing, ordinal INCOME non-negative integer 1~11, 99='NOT STATED' i.e. missing ordinal DEPRESSION probability of depression, non-negative probability, 2 decimal points 0~1, No missing CHRONIC number of chronic conditions , non-negative integer 0~20, No Missing, continuous VISITS number of doctor visits, non-negative integer, possibility >10, No missing, continuous BodyMass Body Mass Index (BMI) SEX Binary * Male (1), Female (2) SOMKING smoking status, non-negative integer 1~6, 99='NOT STATED' i.e. missing, ordinal Health Status HINDEX1 the self-assessment to the quality of health, valued as integers from 1 to 5, ordinal, as 1 is good and 5 is bad HINDEX2 the health-utility-index, valued as two decimals from 0 to 1, continuous, as 1 is prefect and 0 is poor. 12/29/2018

Exercise One Assessing the nature of response mechanism MCAR (Missing Completely at Random) MAR (Missing at Random) NMAR (Not Missing at Random) 12/29/2018

Analysis of Maximum Likelihood Estimates Assessing response mechanism *SAS OUTPUT* Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.3812 0.2657 27.0207 <.0001 Intercept2 1 0.7257 0.2640 7.5563 0.0060 Intercept3 1 2.8791 0.2803 105.4983 <.0001 Intercept4 1 5.0895 0.3701 189.1461 <.0001 age 1 -0.1260 0.0252 25.0844 <.0001 sex 1 0.1492 0.0937 2.5361 0.1113 income 1 0.1146 0.0198 33.5146 <.0001 bodymass 1 -0.00350 0.00385 0.8236 0.3641 smoking 1 0.1623 0.0224 52.5551 <.0001 depression 1 -0.2716 0.1819 2.2294 0.1354 chronic 1 -0.3970 0.0408 94.8984 <.0001 visits 1 -0.0371 0.00497 55.6429 <.0001 12/29/2018

Result of Assessing It’s NOT MCAR but MAR All the following imputation will base on this response mechanism. 12/29/2018

Exercise Two Deciding on the method to deal with the missing data out of the popular methods: Mean Regression Multiple Imputation EM Algorithm Nearest Neighbour 12/29/2018

Conclusion of imputation ->Impute missing value using Regression Comparing with the results for the 5 methods, we conclude the Regression Imputation is most efficient in our case. 12/29/2018

Exercise Three Analysis Comparison The linear mixed model The log-linear regression model Comparison Hindex1 & Hindex2 12/29/2018

Linear Mixed Model Figure 1: The histogram of hindex2 12/29/2018

Figure 2 : The relationship between index2 and income in each age group 12/29/2018

Figure 3 : The relationship b/w index1 and income in each age group 12/29/2018

Linear Mixed Model Fit the linear mixed model: Fixed effects: hindex2log ~ income +depression + chronic + visits Value Std.Error DF t-value p-value (Intercept) 1.535417 0.07323138 2376 20.96665 <.0001 income 0.028688 0.00400363 2376 7.16541 <.0001 depression -0.400730 0.03896362 2376 -10.28473 <.0001 chronic -0.059074 0.00765291 2376 -7.71910 <.0001 visits -0.011897 0.00100756 2376 -11.80800 <.0001 12/29/2018

Log-linear Regression Model Fit the log-linear model: Coefficients: Value Std. Error t value (Intercept) 0.848604497 0.00204314170 415.34295 age 0.005055119 0.00019830895 25.49113 income -0.027352596 0.00019163407 -142.73348 smoking -0.030203852 0.00021805431 -138.51527 chronic 0.077974419 0.00033867474 230.23394 visits 0.006523856 0.00004247647 153.58752 12/29/2018

Association between Hindex1 & Hindex2 Figure 4 : The relationship b/w index1 and index2 Coefficients: Value Std. Error t value (Intercept) 1.1409198 0.03973353 28.71428 hindex2log -0.2669291 0.02450110 -10.89457 12/29/2018

Figure 5: The relationship b/w index1 and index2 in each age group 12/29/2018

Conclusion Since index2, “the health utility index” is more subject, useful, and appropriate index to measure the health status comparing to index1, the self-assessment answer. It will reveal more information, while index1 seems all close to 1 or 2 which means despite their age, income level, people tends to overestimate their health status. Age still plays the most important role about people's health status 12/29/2018

Thank you! Statistical Society of Canada for Providing the Data Prof. Peggy Ng for Financial Support Prof. Peter Song for Providing Books on EM Algorithm Mr. BaiFang Xing for Helpful Discussions 12/29/2018