Research Support Center Chongming Yang SPSS Workshop Research Support Center Chongming Yang
Causal Inference If A, then B, under condition C If A, 95% Probability B, under condition C
Student T Test (William S. Gossett’s pen name = student) Assumptions Small Sample Normally Distributed t distributions: t = [ x - μ ] / [ s / sqrt( n ) ] df = degrees of freedom=number of independent observations
Type of T Tests One sample Two independent samples Paired test against a specific (population) mean Two independent samples compare means of two independent samples that represent two populations Paired compare means of repeated samples
One Sample T Test Conceputally convert sample mean to t score and examine if t falls within acceptable region of distribution
Two Independent Samples
Paired Observation Samples d = difference value between first and second observations
Multiple Group Issues Groups A B C comparisons AB AC BC .95 .95 .95 Joint Probability that one differs from another .95*.95*.95 = .91
Analysis of Variance (ANOVA) Completely randomized groups Compare group variances to infer group mean difference Sources of Total Variance Within Groups Between Groups F distribution SSB = between groups sum squares SSW = within groups sum squares
Fisher-Snedecor Distribution
F Test Null hypothesis: 𝑥 1 = 𝑥 2 = 𝑥 3 ...= 𝑥 𝑛 Given df1 and df2, and F value, Determine if corresponding probability is within acceptable distribution region
Issues of ANOVA Indicates some group difference Does not reveal which two groups differ Needs other tests to identify specific group difference Hypothetical comparisons Contrast No Hypothetical comparisons Post Hoc ANOVA has been replaced by multiple regressions, which can also be replaced by General Linear Modeling (GLM)
Multiple Linear Regression Causes 𝑥 cab be continuous or categorical Effect 𝑦 is continuous measure Mild causal terms predictors Objective identify important 𝑥
Assumptions of Linear Regression Y and X have linear relations Y is continuous or interval & unbounded expected or mean of = 0 = normally distributed not correlated with predictors Predictors should not be highly correlated No measurement error in all variables
Least Squares Solution Choose 𝛽 0 , 𝛽 1 , 𝛽 2 , 𝛽 3 ,... 𝛽 𝑘 to minimize the sum of square of difference between observed 𝑦 𝑖 and model estimated/predicted 𝑦 𝑖 Through solving many equations
Explained Variance in 𝑦
Standard Error of 𝛽
T Test significant of 𝛽 t = 𝛽 / SE𝛽 If t > a critical value & p <.05 Then 𝛽 is significantly different from zero
Confidence Intervals of 𝛽
Standardized Coefficient (𝛽𝑒𝑡𝑎) Make 𝛽s comparable among variables on the same scale (standardized scores)
Interpretation of 𝛽 If x increases one unit, y increases 𝛽 unit, given other values of X
Model Comparisons Complete Model: Reduced Model: Test F = Msdrop / MSE MS = mean square MSE = mean square error
Variable Selection Select significant from a pool of predictors Stepwise undesirable, see http://en.wikipedia.org/wiki/Stepwise_regression Forward Backward (preferable)
Dummy-coding of Nominal 𝑥 R = Race(1=white, 2=Black, 3=Hispanic, 4=Others) R d1 d2 d3 1 1 0 0 2 0 1 0 3 0 0 1 4 0 0 0 Include all dummy variables in the model, even if not every one is significant.
Interaction Create a product term X2X3 Include X2 and X3 even effects are not significant Interpret interaction effect: X2 effect depends on the level of X3.
Plotting Interaction Write out model with main and interaction effects, Use standardized coefficient Plug in some plausible numbers of interacting variables and calculate y Use one X for X dimension and Y value for the Y dimension See examples http://frank.itlab.us/datamodel/node104.html
Diagnostic Linear relation of predicted and observed (plotting Collinearity Outliers Normality of residuals (save residual as new variable)
Repeated Measures (MANOVA, GLM) Measure(s) repeated over time Change in individual cases (within)? Group differences (between, categorical x)? Covariates effects (continuous x)? Interaction between within and between variables?
Assumptions Normality Sphericity: Variances are equal across groups so that Total sum of squares can be partitioned more precisely into Within subjects Between subjects Error
Model 𝜇 = grand mean 𝜋 𝑖 = constant of individual i 𝜏 𝑗 = constant of jth treatment 𝜀 𝑖𝑗 = error of i under treatment j 𝜋𝜏 = interaction
F Test of Effects F = MSbetween / Mswithin (simple repeated) F = Mstreatment / Mserror (with treatment) F = Mswithin / Msinteraction (with interaction)
Four Types Sum-Squares Type I balanced design Type II adjusting for other effects Type III no empty cell unbalanced design Type VI empty cells
Exercise http://www.ats.ucla.edu/stat/spss/seminars/Repeated_Measures/default.htm Copy data to spss syntax window, select and run Run Repeated measures GLM