Unit 3: Inferences about a Single Mean (1 Parameter models)

Unit 3: Inferences about a Single Mean (1 Parameter models)

Units 3-4 Organization First consider details of simplest model (one parameter estimate; mean-only model; no X’s) Next examine simple regression (two parameter estimates, one X for one quantitative predictor variable) These provide critical foundation for all linear models Subsequent units will generalize to one dichotomous predictor variable (Unit 5), multiple predictor variables (Units 6-7) and beyond….

Linear Models as Models
Linear models (including regression) are ‘models’ DATA = MODEL + ERROR Three general uses for models: Describe and summarize DATA (Ys) in a simpler form using MODEL Predict DATA (Ys) from MODEL Will want to know precision of prediction. How big is error? Better prediction with less error. Understand (test inferences about) complex relationships between individual regressors (Xs) in MODEL and the DATA (Ys). How precise are estimates of relationship? MODELS are simplifications of reality. As such, there is ERROR. They also make assumptions that must be evaluated

Fear Potentiated Startle (FPS)
We are interested in producing anxiety in the laboratory To do this, we develop a procedure where we expose people to periods of unpredictable electric shock administration alternating with periods of safety. We measure their startle response in the shock and safe periods. We use the difference between their startle during shock – safe to determine if they are anxious. This is called Fear potentiated startle (FPS). Our procedure works if FPS > 0. We need a model of FPS scores to determine if FPS > 0.

Fear Potentiated Startle: One parameter model
A very simple model for the population of FPS scores would predict the same value for everyone in the population. Yi = 0 We would like this value to be the “best” prediction. In the context of DATA = MODEL + ERROR, how can we quantify “best”? ^ We want to predict some characteristic about the population of FPS scores that minimizes the ERROR from our model. ERROR = DATA – MODEL i = Yi – Yi; There is an error (i) for each population score. How can we quantify total model error? ^

Total Error Sum of errors across all scores in the population isn’t ideal b/c positive and negative errors will tend to cancel each other (Yi – Yi) Sum of absolute value of errors could work. If we selected 0 to minimize the sum of the absolute value of errors,  would equal the median of the population. ( |Yi – Yi| ) Sum of squared errors (SSE) could work. If we selected  to minimize the sum of squared errors, 0 would equal the mean of the population. ^ ^ ^ (Yi – Yi)2

One parameter model for FPS
For the moment, lets assume we prefer to minimize SSE (more on that in a moment). You should predict the population mean FPS for everyone. Yi = 0 where 0 =  What is the problem with this model and how can we fix this problem? ^ We don’t know the population mean for FPS scores (). We can collect a sample from the population and use the sample mean (X) as an estimate of the population mean (). X is an unbiased estimate for 

Model Parameter Estimation
Population model Yi = 0 where  0 =  Yi = 0 + i Estimate population parameters from sample Yi = b0 where b0 = X Yi = b0 + ei ^ ^

Least Squares Criterion
In ordinary least squares (OLS) regression and other least squares linear models, the model parameter estimates (e.g., b0) are calculated such that they minimize the sum of squared errors (SSE) in the sample in which you estimate the model. SSE =  (Yi – Yi)2 SSE = ei2 ^

Properties of Parameter Estimates
There are 3 properties that make a parameter estimate attractive. Unbiased: Mean of the sampling distribution for the parameter estimate is equal to the value for that parameter in the population. Efficient: The sample estimates are close to the population parameter. In other words, the narrower the sampling distribution for any specific sample size N, the more efficient the estimator. Efficient means small SE for parameter estimate Consistent: As the sample size increases, the sampling distribution becomes narrower (more efficient). Consistent means as N increases, SE for parameter estimate decreases

Least Squares Criterion
If the i are normally distributed, both the median and the mean are unbiased and consistent estimators. The variance of the sampling distribution for the mean is: 2 N The variance of the sampling distribution for the median is: 2 2N Therefore the mean is the more efficient parameter estimate. For this reason, we tend to prefer to estimate our models by minimizing the sum of squared errors.

Fear-potentiated startle during Threat of Shock
> FilePath = 'G:/LectureDataR' > FileName = '3_SingleMean_FPS.dat' > d = dfReadDat(file.path(FilePath,FileName)) > str(d) 'data.frame': 96 obs. of 1 variables: $ FPS: num > head(d) FPS

Descriptives and Univariate Plots
> varDescribe(d) var n mean sd median min max skew kurtosis FPS > windows() #on MAC, use quartz() > par('cex' = 1.5, 'lwd'=2) > hist(d$FPS)

FPS Experiment: The Inference Details
Goal: Determine if our shock threat procedure is effective at potentiating startle (increasing startle during threat relative to safe) Create a simple model of FPS scores in the population FPS = 0 Collect sample of N=96 to estimate 0 Calculate sample parameter estimate (b0) that minimizes SSE in sample Use b0 to test hypotheses H0: 0 = 0 Ha: 0 <> 0

Estimating a one parameter model in R
m = lm(FPS ~ 1, data = d) > modelSummary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): , Error df: 95 R-squared: This tells us about how well the model fits the data. Specifically it is the sum of the squared differences between the predicted values and the actual participant scores ei = (Yi – Yi) ^

Errors/Residuals ^ ei = (Yi – Yi)
R can report errors for each individual in the sample: > modelErrors(m) You can get also manually calculate the SSE easily: > sum(modelErrors(m)^2) [1]

Coefficients (Parameter Estimates)
modelSummary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): , Error df: 95 R-squared: This is b0, the unbiased sample estimate of 0, and its standard error. It is also called the intercept in regression (more on this later). Yi = b0 Yi = 32.2 > coef(m) (Intercept) ^

Predicted Values ^ Yi = 32.19 You can get the predicted value for each individual in the sample using this model: > modelPredictions(m)

Testing Inferences about 0
summary(m) lm(formula = FPS ~ 1, data = d) Observations: 96 Linear model fit by least squares Coefficients: Estimate SE t Pr(>|t|) (Intercept) *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Sum of squared errors (SSE): , Error df: 95 R-squared: This is the t-statistic to test the H0 that 0 = 0. The probability (p-value) of obtaining a sample b0 = 32.2 if H0 is true (0 = 0) < Describe the logic of how this was determined given your understanding of sampling distributions?

Sampling Distribution: Testing Inferences about 0
H0: 0 = 0; Ha: 0 <> 0 If H0 is true, the sampling distribution for 0 will have a mean of 0. We can estimate standard deviation of the sampling distribution with SE for b0. t (df=N-P) = b0 – = – 0 = 8.40 SEb b0 is approximately 8 standard deviations above the expected mean of the distribution if H0 is true pt(8.40,95,lower.tail=FALSE) * 2 [1] e-13 The probability of obtaining a sample b0 = 32.2 (or more extreme) if H0 is true is very low (< .05). Therefore we reject H0 And conclude that 0 <> 0 and b0 is our best (unbiased) estimate of it.

Statistical Inference and Model Comparisons
Statistical inference about parameters is fundamentally about model comparisons You are implicitly (t-test of parameter estimate) or explicitly (F-test of model comparison) comparing two different models of your data We follow Judd et al and call these two models the compact model and the augmented model. The compact model will represent reality as the null hypothesis predicts. The augmented model will represent reality as the alternative hypothesis predicts. The compact model is simpler (fewer parameters) than the augmented model. It is also nested in the augmented model (i.e. a subset of parameters)

Model Comparisons: Testing inferences about 0
^ FPSi = 0 H0: 0 = 0 Ha: 0 <> 0 Compact model: FPSi = 0; Augmented model: FPSi = 0 ( b0) We estimate 0 parameters (P=0) in this compact model We estimate 1 parameter (P=1) in this augmented model Choosing between these two models is equivalent to testing if 0 = 0 as you did with the t-test ^ ^

Model Comparison Plots

^ Compact model: FPSi = 0 Augmented model: FPSi = 0 ( b0) We can compare (and choose between) these two models by comparing their total error (SSE) in our sample SSE = (Yi – Yi)2 SSE(C) = (Yi – Yi)2 = (Yi – 0)2 > sum((d$FPS - 0)^2) [1] SSE(A) = (Yi – Yi)2 = (Yi – 32.19)2 > sum((d$FPS – coef(m)[1])^2 > #(sum(modelErrors(m)^2) [1] ^ ^ ^

^ Compact model: FPSi = 0; SSE = 233,368.3 P = 0 Augmented model: FPSi = 0 ( b0) SSE = 133,888.3 P=1 F (PA – PC, N – PA) = (SSE(C) -SSE(A)) / (PA-PC) SSE(A) / (N-PA) F (1– 0, 96 – 1) = ( ) / (1 - 0) / (96 - 1) F(1,95) = 70.59, p < .0001 > pf( ,1,95, lower.tail=FALSE) [1] e-13 ^

Sampling Distribution vs. Model Comparison
The two approaches to testing H0 about parameters (0, j) are statistically equivalent They are complementary approaches with respect to conceptual understanding of GLMs Sampling distribution Focus on population parameters and their estimates Tight connection to sampling and probability distributions Understanding of SE (sampling error/power; confidence intervals; graphic displays) Model comparison Focus on models themselves increase Highlights model fit (SSE) and model parsimony (P) Clearer link to PRE (p2) Test comparisons that differ by > 1 parameter (discouraged)

Effect Sizes Your parameter estimates are descriptive. They describe effects in the original units of the (IVs) and DV. Report them in your paper There are many other effect size estimates available. You will learn two that we prefer. Partial eta2 (p2): Judd et al call this PRE (proportional reduction in error) Eta2 (2): This is also commonly referred to as R2 in regression.

Partial Eta2 or PRE ^ Compact model: FPSi = 0; SSE = 233,368.3 P = 0
Augmented model: FPSi = 0 ( b0) SSE = 133,888.3 P=1 How much was the error reduced in the augmented model relative to the compact model? SSE(C) – SSE(A) = 233, ,888.3 = .426 SSE (C) ,368.3 Our more complex model that includes 0 reduces prediction error (SSE) by approximately 43%. Not bad! ^

Confidence Interval for b0
A confidence interval (CI) is an interval for a parameter estimate in which you can be fairly confident that you will capture the true population parameter (in this case, 0). Most commonly reported is the 95% CI. Across repeated samples, 95% of the calculated CIs will include the population parameter. > confint(m) 2.5 % % (Intercept) Given what you now know about confidence intervals and sampling distributions, what should the formula be? CI (b0) = b0 + t (; N-P) * SEb0 For the 95% confidence interval this is approximately + 2 SEs around our unbiased estimate of 0

Confidence Interval for b0
How can we tell if a parameter is “significant” from the confidence interval? If a parameter estimate <> 0 at  = .05, then the 95% confidence interval for its parameter estimate should not include 0. This is also true for testing whether the parameter estimate is equal to any other non-zero value for the population parameter

The one parameter (mean-only) model: Special Case
What special case (specific analytic test) is statistically equivalent to the test of the null hypothesis: 0 = 0 in the one parameter model? The one sample t-test testing if a population mean = 0. > t.test(d$FPS) One Sample t-test data: d$FPS t = , df = 95, p-value = 4.261e-13 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: sample estimates: mean of x

Testing 0 = non-zero values
How could you test an H0 regarding 0 = some value other than 0 (e.g., 10)? HINT: There are at least three methods. Option 1: Compare SSE for the augmented model (Yi = 0 ) to SSE from a different compact model for this new H0 (Yi = 10) Option 3: Does the confidence interval for the parameter estimate contain this other value? No p-value provided. > confint(m) 2.5 % % (Intercept) Option 2: Recalculate t-statistic using this new H0. t = b0 – 10 SEb0 ^

Intermission….. One parameter (0) “mean-only” model
Description: b0 describes mean of Y Prediction: b0 is predicted value that minimizes sample SSE Inference: Use b0 to test if 0 = 0 (default) or any other value. One sample t-test. Two parameter (0, 1) model Description: b1 describes how Y changes as function of X1. b0 describes expected value of Y at specific value (0) for X1. Prediction: b0 and b1 yield predicted values that vary by X1 and minimize SSE in sample. Inference: Test if 1 = 0. Pearson’s r; independent sample t-test. Test if 0 = 0. Analogous to one-sample t-test controlling for X1, if X1 is mean-centered. Very flexible!

Unit 3: Inferences about a Single Mean (1 Parameter models)

Similar presentations

Presentation on theme: "Unit 3: Inferences about a Single Mean (1 Parameter models)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 3: Inferences about a Single Mean (1 Parameter models)

Similar presentations

Presentation on theme: "Unit 3: Inferences about a Single Mean (1 Parameter models)"— Presentation transcript:

Similar presentations

About project

Feedback