Unit 3: Inferences about a Single Mean (1 Parameter models)
Units 3-4 Organization First consider details of simplest model (one parameter estimate; mean-only model; no X’s) Next examine simple regression (two parameter estimates, one X for one quantitative predictor variable) These provide critical foundation for all linear models Subsequent units will generalize to one dichotomous predictor variable (Unit 5), multiple predictor variables (Units 6-7) and beyond….
Linear Models as Models Linear models (including regression) are ‘models’ DATA = MODEL + ERROR Three general uses for models: Describe and summarize DATA (Ys) in a simpler form using MODEL Predict DATA (Ys) from MODEL Will want to know precision of prediction. How big is error? Better prediction with less error. Understand (test inferences about) complex relationships between individual regressors (Xs) in MODEL and the DATA (Ys). How precise are estimates of relationship? MODELS are simplifications of reality. As such, there is ERROR. They also make assumptions that must be evaluated
Fear Potentiated Startle (FPS) We are interested in producing anxiety in the laboratory To do this, we develop a procedure where we expose people to periods of unpredictable electric shock administration alternating with periods of safety. We measure their startle response in the shock and safe periods. We use the difference between their startle during shock – safe to determine if they are anxious. This is called Fear potentiated startle (FPS). Our procedure works if FPS > 0. We need a model of FPS scores to determine if FPS > 0.
Fear Potentiated Startle: One parameter model A very simple model for the population of FPS scores would predict the same value for everyone in the population. Yi = 0 We would like this value to be the “best” prediction. In the context of DATA = MODEL + ERROR, how can we quantify “best”? ^ We want to predict some characteristic about the population of FPS scores that minimizes the ERROR from our model. ERROR = DATA – MODEL i = Yi – Yi; There is an error (i) for each population score. How can we quantify total model error? ^
Total Error Sum of errors across all scores in the population isn’t ideal b/c positive and negative errors will tend to cancel each other (Yi – Yi) Sum of absolute value of errors could work. If we selected 0 to minimize the sum of the absolute value of errors, 0 would equal the median of the population. ( |Yi – Yi| ) Sum of squared errors (SSE) could work. If we selected 0 to minimize the sum of squared errors, 0 would equal the mean of the population. ^ ^ ^ (Yi – Yi)2
One parameter model for FPS For the moment, lets assume we prefer to minimize SSE (more on that in a moment). You should predict the population mean FPS for everyone. Yi = 0 where 0 = What is the problem with this model and how can we fix this problem? ^ We don’t know the population mean for FPS scores (). We can collect a sample from the population and use the sample mean (X) as an estimate of the population mean (). X is an unbiased estimate for
Model Parameter Estimation Population model Yi = 0 where 0 = Yi = 0 + i Estimate population parameters from sample Yi = b0 where b0 = X Yi = b0 + ei ^ ^
Least Squares Criterion In ordinary least squares (OLS) regression and other least squares linear models, the model parameter estimates (e.g., b0) are calculated such that they minimize the sum of squared errors (SSE) in the sample in which you estimate the model. SSE = (Yi – Yi)2 SSE = ei2 ^
Properties of Parameter Estimates There are 3 properties that make a parameter estimate attractive. Unbiased: Mean of the sampling distribution for the parameter estimate is equal to the value for that parameter in the population. Efficient: The sample estimates are close to the population parameter. In other words, the narrower the sampling distribution for any specific sample size N, the more efficient the estimator. Efficient means small SE for parameter estimate Consistent: As the sample size increases, the sampling distribution becomes narrower (more efficient). Consistent means as N increases, SE for parameter estimate decreases
Least Squares Criterion If the i are normally distributed, both the median and the mean are unbiased and consistent estimators. The variance of the sampling distribution for the mean is: 2 N The variance of the sampling distribution for the median is: 2 2N Therefore the mean is the more efficient parameter estimate. For this reason, we tend to prefer to estimate our models by minimizing the sum of squared errors.
Fear-potentiated startle during Threat of Shock > FilePath = 'G:/LectureDataR' > FileName = '3_SingleMean_FPS.dat' > d = dfReadDat(file.path(FilePath,FileName)) > str(d) 'data.frame': 96 obs. of 1 variables: $ FPS: num -98.098 -22.529 0.463 1.194 2.728 ... > head(d) FPS 0011 19.490928 0012 48.406944 0013 -22.528500 0014 6.723783 0015 89.658722 0016 40.573778
Descriptives and Univariate Plots > varDescribe(d) var n mean sd median min max skew kurtosis FPS 2 96 32.19 37.54 19.46 -98.1 162.74 0.62 1.93 > windows() #on MAC, use quartz() > par('cex' = 1.5, 'lwd'=2) > hist(d$FPS)
FPS Experiment: The Inference Details Goal: Determine if our shock threat procedure is effective at potentiating startle (increasing startle during threat relative to safe) Create a simple model of FPS scores in the population FPS = 0 Collect sample of N=96 to estimate 0 Calculate sample parameter estimate (b0) that minimizes SSE in sample Use b0 to test hypotheses H0: 0 = 0 Ha: 0 <> 0
Estimating a one parameter model in R m = lm(FPS ~ 1, data = d) > modelSummary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) 32.191 3.832 8.402 0.000000000000426 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): 133888.3, Error df: 95 R-squared: 0.0000 This tells us about how well the model fits the data. Specifically it is the sum of the squared differences between the predicted values and the actual participant scores ei = (Yi – Yi) ^
Errors/Residuals ^ ei = (Yi – Yi) R can report errors for each individual in the sample: > modelErrors(m) 0011 0012 0013 0014 0015 0016 0021 -12.6999127 16.2161040 -54.7193405 -25.4670572 57.4678817 8.3829373 -2.7175738 0022 0023 0024 0025 0026 0111 0112 -16.8541238 19.6643817 58.6873817 78.7543262 35.0963817 -29.4627960 72.7258928 0113 0114 0115 0116 0121 0122 0123 -31.7275460 36.5672151 19.1260706 -30.9964738 1.5669373 11.7176040 9.3662151 0124 0125 0126 1011 1012 1013 1014 -25.3710072 -130.2886183 53.1913817 29.8681317 59.8164373 -14.1219516 34.7095484 1015 1016 1021 1022 1023 1024 1025 17.9774928 47.3338484 61.4058262 67.7537262 104.6339928 36.5526595 14.2658262 1026 1111 1112 1113 1114 1115 1116 -16.7506349 -29.6592294 12.9909373 20.9858817 -29.1695572 -24.1598966 -19.2076849 1121 1122 1123 1124 1125 1126 2011 11.7108262 -25.2434516 -18.4250627 -20.3317905 -8.4337683 -18.0094960 -12.7704849 2012 2013 2014 2015 2016 2021 2022 3.9210484 -58.2597294 -35.5108960 -32.0183927 -1.7377294 0.3123817 -35.5405016 2023 2024 2025 2026 2111 2112 2113 -12.5921183 25.0772151 -20.6439405 37.4066428 9.3974373 130.5457706 5.2138262 2114 2115 2116 2121 2122 2123 2124 -13.0036627 -9.8150183 -27.4784549 17.0578817 27.5951151 -28.0089794 -28.5735072 2125 2126 3011 3012 3013 3014 3015 -23.4260627 4.5087151 77.8639373 -21.4575572 -18.5716738 -17.1700072 27.4325484 3016 3021 3022 3023 3024 3025 3026 -26.4386960 -18.1054016 6.1488262 -14.5139683 1.6943262 -21.4997294 -25.3833322 3111 3112 3113 3114 3115 3116 3121 -26.9358794 -17.5872294 -25.7722738 4.8073817 -26.9565572 -32.1845627 -31.0086183 3122 3123 3124 3125 3126 -34.0540127 -17.4630572 -31.4756127 -31.8114616 -15.9328183 You can get also manually calculate the SSE easily: > sum(modelErrors(m)^2) [1] 133888.3
Coefficients (Parameter Estimates) modelSummary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) 32.191 3.832 8.402 0.000000000000426 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): 133888.3, Error df: 95 R-squared: 0.0000 This is b0, the unbiased sample estimate of 0, and its standard error. It is also called the intercept in regression (more on this later). Yi = b0 Yi = 32.2 > coef(m) (Intercept) 32.19084 ^
Predicted Values ^ Yi = 32.19 You can get the predicted value for each individual in the sample using this model: > modelPredictions(m) 0011 0012 0013 0014 0015 0016 0021 0022 0023 0024 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084 0025 0026 0111 0112 0113 0114 0115 0116 0121 0122 0123 0124 0125 0126 1011 1012 1013 1014 1015 1016 1021 1022 1023 1024 1025 1026 1111 1112 1113 1114 1115 1116 1121 1122 1123 1124 1125 1126 2011 2012 2013 2014 2015 2016 2021 2022 2023 2024 2025 2026 2111 2112 2113 2114 2115 2116 2121 2122 2123 2124 2125 2126 3011 3012 3013 3014 3015 3016 3021 3022 3023 3024 3025 3026 3111 3112 3113 3114 3115 3116 3121 3122 3123 3124 3125 3126 32.19084 32.19084 32.19084 32.19084 32.19084 32.19084
Testing Inferences about 0 summary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) 32.191 3.832 8.402 0.000000000000426 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): 133888.3, Error df: 95 R-squared: 0.0000 This is the t-statistic to test the H0 that 0 = 0. The probability (p-value) of obtaining a sample b0 = 32.2 if H0 is true (0 = 0) < .0001. Describe the logic of how this was determined given your understanding of sampling distributions?
Sampling Distribution: Testing Inferences about 0 H0: 0 = 0; Ha: 0 <> 0 If H0 is true, the sampling distribution for 0 will have a mean of 0. We can estimate standard deviation of the sampling distribution with SE for b0. t (df=N-P) = b0 – 0 = 32.2 – 0 = 8.40 SEb0 3.8 b0 is approximately 8 standard deviations above the expected mean of the distribution if H0 is true pt(8.40,95,lower.tail=FALSE) * 2 [1] 4.293253e-13 The probability of obtaining a sample b0 = 32.2 (or more extreme) if H0 is true is very low (< .05). Therefore we reject H0 And conclude that 0 <> 0 and b0 is our best (unbiased) estimate of it.
Statistical Inference and Model Comparisons Statistical inference about parameters is fundamentally about model comparisons You are implicitly (t-test of parameter estimate) or explicitly (F-test of model comparison) comparing two different models of your data We follow Judd et al and call these two models the compact model and the augmented model. The compact model will represent reality as the null hypothesis predicts. The augmented model will represent reality as the alternative hypothesis predicts. The compact model is simpler (fewer parameters) than the augmented model. It is also nested in the augmented model (i.e. a subset of parameters)
Model Comparisons: Testing inferences about 0 ^ FPSi = 0 H0: 0 = 0 Ha: 0 <> 0 Compact model: FPSi = 0; Augmented model: FPSi = 0 ( b0) We estimate 0 parameters (P=0) in this compact model We estimate 1 parameter (P=1) in this augmented model Choosing between these two models is equivalent to testing if 0 = 0 as you did with the t-test ^ ^
Model Comparison Plots
Model Comparisons: Testing inferences about 0 ^ Compact model: FPSi = 0 Augmented model: FPSi = 0 ( b0) We can compare (and choose between) these two models by comparing their total error (SSE) in our sample SSE = (Yi – Yi)2 SSE(C) = (Yi – Yi)2 = (Yi – 0)2 > sum((d$FPS - 0)^2) [1] 233368.3 SSE(A) = (Yi – Yi)2 = (Yi – 32.19)2 > sum((d$FPS – coef(m)[1])^2 > #(sum(modelErrors(m)^2) [1] 133888.3 ^ ^ ^
Model Comparisons: Testing inferences about 0 ^ Compact model: FPSi = 0; SSE = 233,368.3 P = 0 Augmented model: FPSi = 0 ( b0) SSE = 133,888.3 P=1 F (PA – PC, N – PA) = (SSE(C) -SSE(A)) / (PA-PC) SSE(A) / (N-PA) F (1– 0, 96 – 1) = (233368.3-133888.3) / (1 - 0) 133888.3 / (96 - 1) F(1,95) = 70.59, p < .0001 > pf(70.58573,1,95, lower.tail=FALSE) [1] 4.261256e-13 ^
Sampling Distribution vs. Model Comparison The two approaches to testing H0 about parameters (0, j) are statistically equivalent They are complementary approaches with respect to conceptual understanding of GLMs Sampling distribution Focus on population parameters and their estimates Tight connection to sampling and probability distributions Understanding of SE (sampling error/power; confidence intervals; graphic displays) Model comparison Focus on models themselves increase Highlights model fit (SSE) and model parsimony (P) Clearer link to PRE (p2) Test comparisons that differ by > 1 parameter (discouraged)
Effect Sizes Your parameter estimates are descriptive. They describe effects in the original units of the (IVs) and DV. Report them in your paper There are many other effect size estimates available. You will learn two that we prefer. Partial eta2 (p2): Judd et al call this PRE (proportional reduction in error) Eta2 (2): This is also commonly referred to as R2 in regression.
Partial Eta2 or PRE ^ Compact model: FPSi = 0; SSE = 233,368.3 P = 0 Augmented model: FPSi = 0 ( b0) SSE = 133,888.3 P=1 How much was the error reduced in the augmented model relative to the compact model? SSE(C) – SSE(A) = 233,368.3 - 133,888.3 = .426 SSE (C) 233,368.3 Our more complex model that includes 0 reduces prediction error (SSE) by approximately 43%. Not bad! ^
Confidence Interval for b0 A confidence interval (CI) is an interval for a parameter estimate in which you can be fairly confident that you will capture the true population parameter (in this case, 0). Most commonly reported is the 95% CI. Across repeated samples, 95% of the calculated CIs will include the population parameter. > confint(m) 2.5 % 97.5 % (Intercept) 24.58426 39.79742 Given what you now know about confidence intervals and sampling distributions, what should the formula be? CI (b0) = b0 + t (; N-P) * SEb0 For the 95% confidence interval this is approximately + 2 SEs around our unbiased estimate of 0
Confidence Interval for b0 How can we tell if a parameter is “significant” from the confidence interval? If a parameter estimate <> 0 at = .05, then the 95% confidence interval for its parameter estimate should not include 0. This is also true for testing whether the parameter estimate is equal to any other non-zero value for the population parameter
The one parameter (mean-only) model: Special Case What special case (specific analytic test) is statistically equivalent to the test of the null hypothesis: 0 = 0 in the one parameter model? The one sample t-test testing if a population mean = 0. > t.test(d$FPS) One Sample t-test data: d$FPS t = 8.4015, df = 95, p-value = 4.261e-13 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 24.58426 39.79742 sample estimates: mean of x 32.19084
Testing 0 = non-zero values How could you test an H0 regarding 0 = some value other than 0 (e.g., 10)? HINT: There are at least three methods. Option 1: Compare SSE for the augmented model (Yi = 0 ) to SSE from a different compact model for this new H0 (Yi = 10) Option 3: Does the confidence interval for the parameter estimate contain this other value? No p-value provided. > confint(m) 2.5 % 97.5 % (Intercept) 24.58426 39.79742 Option 2: Recalculate t-statistic using this new H0. t = b0 – 10 SEb0 ^
Intermission….. One parameter (0) “mean-only” model Description: b0 describes mean of Y Prediction: b0 is predicted value that minimizes sample SSE Inference: Use b0 to test if 0 = 0 (default) or any other value. One sample t-test. Two parameter (0, 1) model Description: b1 describes how Y changes as function of X1. b0 describes expected value of Y at specific value (0) for X1. Prediction: b0 and b1 yield predicted values that vary by X1 and minimize SSE in sample. Inference: Test if 1 = 0. Pearson’s r; independent sample t-test. Test if 0 = 0. Analogous to one-sample t-test controlling for X1, if X1 is mean-centered. Very flexible!