DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.

Slides:



Advertisements
Similar presentations
Treatment of missing values
Advertisements

 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Analyzing Survey Data Angelina Hill, Associate Director of Academic Assessment 2009 Academic Assessment Workshop May 14 th & 15 th UNLV.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Adapting to missing data

How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Experimental Research
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
STAT 3130 Statistical Methods II Missing Data and Imputation.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
1 Updates on Regulatory Requirements for Missing Data Ferran Torres, MD, PhD Hospital Clinic Barcelona Universitat Autònoma de Barcelona.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
1 G Lect 13M Why might data be missing in psychological studies? Missing data patterns Overview of statistical approaches Example G Multiple.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1 Handling of Missing Data. A regulatory view Ferran Torres, MD, PhD IDIBAPS. Hospital Clinic Barcelona Autonomous University of Barcelona (UAB)
Tutorial I: Missing Value Analysis
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
INTRODUCTION TO LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
LINEAR MIXED-EFFECTS MODELS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
An Introduction to Latent Curve Models
Missing data: Why you should care about it and what to do about it
Handling Attrition and Non-response in the 1970 British Cohort Study
Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017
MISSING DATA AND DROPOUT
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Dealing with missing data
Presenter: Ting-Ting Chung July 11, 2017
Working with missing Data
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
An Introductory Tutorial
EM for Inference in MV Data
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
Missing Data Mechanisms
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
EM for Inference in MV Data
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Clinical prediction models
Chapter 13: Item nonresponse
Missing data: Is it all the same?
Presentation transcript:

DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1

Exploratory Data Analysis  Structure of longitudinal data refers to the format of the dataset  Exploration of longitudinal data refers to initial steps in the data analytic process using graphical and descriptive methods 2

Format of Longitudinal Data  Wide Format:  It is the standard structure of longitudinal data  Referred as subjects-by-variables format or multivariate format  Example: 3 SubjectWave i 1234Medication

Format of Longitudinal Data  Wide Format:  Data in wide format are used in profile analysis with repeated measures ANOVA/MANOVA  Descriptive statistics, such as means, correlations (and covariances) between measurement occasions, can be computed easily when data are structured this way 4

Format of Longitudinal Data  Long Format:  Referred as univariate format  The time metric is explicit in long format appearing as a time variable in its own column  Linear Mixed Effects (LME) Modeling requires the data structure to be this format  Graphing of individual curves, and graphs of means also requires data structure in long format 5

Format of Longitudinal Data  Long Format:  Example: 6 SubjectWavePainMedication

Balanced And Complete  What is balance design?  Balance design refers to a design in which participants are measured at the same time points;  whereas, unbalanced design occurs when not all participants are measured at the same time points 7

Balanced And Complete  What is complete data set?  Complete data occurs when there are no missing data—observations that were planned, were realized;  whereas, incomplete data indicates missing data (i.e., where observations were planned but not realized) 8

Examples: Balanced And Complete  Unbalanced Design – Complete Data 9 Subject #1 AgeScore Subject #2 AgeScore Subject #3 AgeScore

Examples: Balanced And Complete  Balanced Design – Incomplete Data 10 Age Subject

Examples: Balanced And Complete  Unbalanced Design – Incomplete Data *It is assumed that the researcher planned to measure subject #1 yearly but other subjects every two years. 11 Age Subject * NA11.6NA NA12.8NA15.3NA12.2

Treatment of Imbalance and Incompleteness  Data that come from an unbalanced design with missing data is sometime treated as complete and balanced if: 1. the number of waves is equal for all participants, or 2. the researcher deletes data to force an equal number of waves 12

Treatment of Imbalance and Incompleteness  Consider incomplete data from a unbalanced design 13 Age Subject NA11.6NA NA12.8NA15.3NA.

Treatment of Imbalance and Incompleteness  Suppose in analysis the successive waves of measurement were of most substantive importance rather than timing of the observations  14 Wave Subject

Treatment of Imbalance and Incompleteness  In the example, chronology metric (i.e., time scale) is ignored and so is the variability in timing of observations  Ignoring time scale (e.g., age) may be indefensible, especially if the scores reflect some type of developmental phenomenon that is naturally tied to time scale  15

Treatment of Imbalance and Incompleteness  The forced complete and balanced scenario is the only choice when either repeated measures ANOVA or MANOVA is used (criticism of these methods)  Longitudinal methods do not force data to be complete and balanced  These methods allow the observations to be anchored to the chronology metric rather than the order in which the observations were obtained 16

Missing Data in LME Analysis  In the LME analysis, we will ignore missing data in the long format of data structure  Alternatively, any row of the long format data frame that has an NA will not be omitted  When NAs occur only for the response variable, a subject is included in the LME analysis as long as they have at least one non-missing time point 17

Missing Data in LME Analysis  When NAs occur for a static / time-invariant predictor, then the entire record of the subject is deleted, meaning the subject is omitted from the analysis  In R the na.omit() function will omit any rows of a data frame that have at least one NA  To illustrate na.omit(), we select the first three subjects of our long format MPLS data (MPLS.long) 18

Missing Data in LME Analysis 19 > MPLS.ls <- subset(MPLS.long, subid < 4, select = c(subid, read, grade, gen)) > MPLS.ls subid read grade gen F F F F F F F NA 8 F M M M M

Missing Data in LME Analysis  Suppose we apply the na.omit() function to the MPLS.ls data frame and save the result as omit1 20 > omit1 <- na.omit(MPLS.ls) > omit1 subid read grade gen F F F F F F F M M M M

Missing Data in LME Analysis  Another example, suppose we induce missing values for a static predictor  Let us assign NA to predictor gender, labeled as gen, for the first subject 21

Missing Data in LME Analysis 22 > MPLS.ls1 <- MPLS.ls > MPLS.ls1[1:4,4] <- NA > MPLS.ls1 subid read grade gen F F F NA 8 F M M M M

Missing Data in LME Analysis  Now we apply na.omit() to the MPLS.ls1 data frame and save the result as omit2 23 > omit2 <- na.omit(MPLS.ls1) > omit2 subid read grade gen F F F M M M M

Retain or Omit Missing Data Rows?  In LME analysis there are two options we will consider for the long format data frame 1. the NA values can be left in the data frame or, 2. na.omit() can be used to eliminate the rows containing NA  For the LME analysis, the results will be identical either way 24

Retain or Omit Missing Data Rows? 1. If NA values occur only for the response variable, it is recommended the missing value rows be omitted with na.omit() and the resulting data frame used in all analyses 2. If NA values occur for static predictors, it is unclear if the missing data rows should be omitted  Alternatively, a subject might be retained or excluded depending on which static predictors are used 25

Retain or Omit Missing Data Rows?  For the LME analysis, the results are most valid assuming no missing values for the static predictors  Missing values are allowed on the response variable with the validity of the analysis depending on certain assumptions about the missingness, such as Missing at Random (MAR), Missing Completely at Random (MCAR) 26

Retain or Omit Missing Data Rows? 3. When the data frame contains more than one response variable and the pattern of missingness differs  If the response variables are to be analyzed separately, it might be tolerable to have the number of subjects vary based on the response variable analyzed 27

Missing Data Mechanisms 28  Can be thought of as a process that acts on the unobserved complete data set to produce the incomplete observed data set  Consider the measurement of a response variable at two time points  Suppose there is no missing data at the first time point but there is missing data at the second time point

Missing Data Mechanisms 29  The missing data mechanisms are defined by whether the missing data at time 2 depend on: 1. The observed data at time 1 2. The observed data at time 2 3. The missing data at time 2 4. None of the above

Missing Data Mechanisms 30  MCAR:  Characterized by number 4 in the list  The process is completely random and unrelated to observed or missing data  The incomplete observed sample is assumed to be a random sample of the unobserved complete data

Missing Data Mechanisms 31

Missing Data Mechanisms 32  Example of MCAR Process:  Suppose in the MPLS study, we want to relieve the response load by obtaining only four waves of data for two cohorts of individuals  Subjects are randomly selected for the cohorts and cohort 1 is measured over grades 5-8 and cohort 2 is measured over grades 6-9 Cohort XXXXNA 2 XXXX

Missing Data Mechanisms 33  MAR:  Characterized by number 1 or number 2 in the list

Missing Data Mechanisms 34  Example of MAR Process is when the researcher decides not to measure some subjects at time 2 after observing their scores at time 1  This might occur if the researcher is studying methods of increasing reading scores and plans two repeated measurements  Subjects who have perfect scores at time 1 will not increase over time and thus, are not invited back for the second measurement

Missing Data Mechanisms 35  By knowing a subject's score at time 1, we can determine if they have a missing value at time 2  Note: the prediction of missing values can be based on variables other than the response variable

Missing Data Mechanisms 36  NMAR:  Characterized by number 3 in the list

Missing Data Mechanisms 37  Example of NMAR Process:  Consider computer administered reading test. After the test a subject is allowed to see their reading score and then decide to retain it or delete it. A retained score is ultimately observed by the researcher whereas a deleted score is not. If for some subjects their score decreases from time 1 to time 2, they might be unhappy and deleted their time 2 score. In this case the missing

Missing Data Mechanisms 38 data are dependent on the observed data at time 1 but also on the missing data at time 2, as the researcher never sees the deleted subjects' time 2 scores

Missing Data Mechanisms 39

Missing Data Techniques 40  Listwise Deletion (LD): removes cases that contain missingness on any of the variables included in the study.  LD can result in biased parameter estimates unless the data are MCAR (Little, & Rubin, 2002).  Pairwise Deletion (PD): removes only cases that contain missingness on the variables used to obtain the target estimators.  PD typically yield inconsistent statistics.

Missing Data Techniques 41  Last Observation Carried Forward (LOCF): missing values replaced by the last observed value among the repeated measurements.  LOCF is unbiased in point estimation under MCAR or MAR, but will perform poorly in SE estimation.  Multiple Imputation (MI): uses stochastic imputation models to impute (fill in) multiple “complete” datasets. The estimates of model parameters are then combined over datasets (Robin, 1978).

Missing Data Techniques 42  Full Information Maximum Likelihood (FIML): aims to find estimates of the parameters by maximizing the likelihood function of parameters given the observed data. Unlike imputation methods, FIML does not create “complete” datasets.  Both MI and FIML yield unbiased parameter estimates under MCAR or MAR.