When the Mean isn’t Enough Methods for Assessing Individual Differences using SAS Melissa McTernan, PhD -- CSU Sacramento
Talk Outline Introduction Common focus on population means Costs of means-only analysis Motivation to look beyond the mean Methods for assessing individual differences in SAS Preparing the data Visualizing the individual Mixed effects models
The Average Californian Do you know her? The average person in California* Is 35 years old Is Latinx Is a woman Is overweight Is a democrat and a Shares a household with 1.95 other people Pays $1,851/mo for her mortgage Has a 28 minute commute to work Works 41.3 hours a week in a retail job Has $11,760 of student debts Drinks 4.8 alcoholic beverages a week … and has spent $295 dollars on spontaneous purchases while under the influence *Based on finder.com data and data from the US Census the Bureau of Labor Statistics
Common Focus on the Pop. Mean Many commonly-used methods only produce mean estimates T-tests Is the mean for group A different than the mean for group B? ANOVAs Do groups A, B, and C have different means? Linear (simple or multiple) Regression How much does Y change based on a single unit change in X, for the average individual?
Limitations of Means-Only Methods The mean may or may not be a good summary of the data Even if the mean is a good summary, the mean still may not represent ANY individual in the population Recall the example of the “average Californian” For which of these distributions is the mean more representative of the group?
Limitations of Means-Only Methods Implications Overgeneralization can lead us down an ugly path… Clinical and medical interventions may work well for the average person, but may actually be harmful for individuals in certain sub-groups Focus on the “average” may hide disparities Example: A longitudinal study may show that, on average, student performance is increasing. Without looking at the individual learning curves, we miss the important fact that some students’ performance is not increasing, or is declining.
Methods for Assessing Individual Differences Using NLSY97 Data, 2006-2008 Variables: Overall outlook on life, whether the participant has health care coverage, and a continuous measure of general health
Preparing the Data ”Wide” format vs. “Long” format
PROC TRANSPOSE is more efficient, but limited to a single variable Preparing the Data ARRAY statements are a simple way to reshape the data, but inefficient for large datasets PROC TRANSPOSE is more efficient, but limited to a single variable
Visualizing Individual Differences PROC SGPLOT to visualize complex data Allows you to build upon a base chart to add layers of chart components Examples: Build a histogram with a density plot overlay Build a scatterplot, then overlay a line of best fit Spaghetti Plots for Longitudinal Data Plot a trajectory for each individual across time Overlay the mean trajectory
Visualizing Individual Differences Layer 1 Layer 2
Visualizing Individual Differences What information would we be missing if we only plotted the red line?
Visualizing Individual Differences PROC SGPANEL Also takes advantage of layering Allows you to compare two side-by-side plots with a “panelby” option Now, we can look at individuals within subgroups, within the sample at large … rather than a mean trajectory across all groups and all people
Visualizing Individual Differences
Visualizing Individual Differences
Accounting for Individual Differences with PROC MIXED In longitudinal statistical analyses
Accounting for Individual Differences with PROC MIXED Linear mixed effects models allow you to add random effects to account for individual differences in model parameters Add a random intercept to account for variance in intercept across people Add a random slope to account for variance in slope across individuals First, let’s look at a model that only provides information about the typical person (i.e. a fixed effects model, or a model without any random effects)
Accounting for Individual Differences with PROC MIXED First, let’s look at a model that only provides information about the typical person …
Accounting for Individual Differences with PROC MIXED Add a REPEATED statement to add variance components for the growth parameters Added statement
Accounting for Individual Differences with PROC NLMIXED In longitudinal statistical analyses
Accounting for Individual Differences with PROC NLMIXED PROC NLMIXED is more flexible than PROC MIXED Non-linear mixed effects models Outcome may be non-normally distributed (i.e. binary) User-defined log-likelihood functions Variance in random intercept
Accounting for Individual Differences with PROC GLIMMIX In longitudinal statistical analyses
Accounting for Individual Differences with PROC GLIMMIX PROC GLIMMIX is also very flexible Generalized Linear Mixed Models Outcome may be non-normally distributed (i.e. binary)
Comparing NLMIXED and GLIMMIX PROC GLIMMIX defaults to a pseudolikelihood approach for selection model parameter estimates PROC NLMIXED defaults to maximum likelihood (ML) using the adaptive Gaussian-Hermite quadrature method of approximation Note: This is identical to the approach that would be used in PROC GLIMMIX if METHOD=QUAD in the GLIMMIX statement
What are the take-aways from this presentation? Conclusions
Conclusions Only visualizing average trends or estimating average parameters may provide information about the typical person, but that is often not useful The ”typical” person may not even exist! SAS Software offers many procedures for data management, data visualization, and data analysis, that preserve information about the individual Making use of these procedures and incorporating them into practice can lead to more effective and informed interventions/responses
Contact Information Name: Melissa McTernan, PhD Sac State University Sacramento, CA Email: mcternan@csus.edu