Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some key developments in data analysis Michael Babyak, PhD.

Similar presentations


Presentation on theme: "Some key developments in data analysis Michael Babyak, PhD."— Presentation transcript:

1 Some key developments in data analysis Michael Babyak, PhD

2

3

4 Areas of development Discarding flawed techniques New types of models Treatment of missing data Simulation and empirical tests Validation

5 Techniques largely discredited or highly suspect Categorization of continuous variables without good reason Automated variable selection without validation Overfitted or “cherry-picked” models

6 New types of models Regression family Clustered data Factor analysis family

7 Generalized Linear Model General Linear Model/ Linear Regression ANOVA/t-test ANCOVA Logistic Regression Chi-square Poisson, ZIP, Negbin, gamma Normal Binary/Binomial Count, heavy skew, Lots of zeros Transformed Can be applied to clustered (e.g, repeated measures data)

8 Factor Analytic Family Structural Equation Models Partial Least Squares Latent Variables (Common Factor Analysis) Multiple regression Principal Components

9 You Use Latent Variables Every Day A Single Measurement is an indicator of an underlying phenomenon, e.g. mercury rising in a sphygmomanometer measures the underlying construct of “blood pressure.” How do you improve the reliability of blood pressure measurement? Measure more than once, perhaps even in different setting (e.g. ambulatory monitoring). A Psychometric Scale is also a collection of indicators of an underlying process, attempting to triangulate on an underlying construct by multiple items (indicators). A Latent Variable is a collection of indicators with the unshared/unreliable part of the indicators removed— what’s the problem?

10 Missing Data Imputation or related approaches are almost ALWAYS better than deleting incomplete cases Multiple Imputation Full Information Maximum Likelihood

11 Out of Missing Data Work Propensity Scoring –“Matches” individuals on multiple dimensions to improve “baseline balance” Complier Average Causal Effect (CACE) –Generates a guess at the effect of a treatment among all potential compliers, including those in the control arm

12 b s1 b s2 b s3 b s4 b sk-1 b sk …………………. Evaluate Y =.4 X + error Simulation Example

13 True Model: Y =.4*x1 + e

14 Validation Split-half better than nothing, but often too conservative Bootstrap Repeated splitting

15 Some Premises “Statistics” is a cumulative, evolving field Newer is not necessarily better, but should be entertained as regards the scientific question at hand Keeping up is hard to do There’s no substitute for thinking about the problem

16 http://www.duke.edu/~mababyak michael.babyak @ duke.edu michael.babyak @ duke.eduhttp://symptomresearch.nih.gov/chapter_8/


Download ppt "Some key developments in data analysis Michael Babyak, PhD."

Similar presentations


Ads by Google