Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Analysis: Statistical Inference

Similar presentations


Presentation on theme: "Regression Analysis: Statistical Inference"— Presentation transcript:

1 Regression Analysis: Statistical Inference

2 Simple Linear Regression Model (SLR)
Assume relationship to be linear y = 0 + 1x +  Where y = dependent variable x = independent variable 0 = y-intercept 1 = slope  = random error

3 Random Error Component ()
Makes this a probabilistic model... Represents uncertainty   random variation not explained by x Deterministic Model = Exact relationship Example: Temperature: oF = 9/5 oC + 32 Assets = Liabilities + Equity Probabilistic Model = Det. Model + Error

4 Model Parameters 0 and 1 Estimated from the data
Data collected as a pair (x,y)

5 Model Assumptions E() = 0 Var() = 2  is normally distributed
I are independent Before performing regression analysis, these assumptions should be validated.

6 Assumptions for Regression
Unknown Relationship Y = b0 + b1X Recall that the model for the linear regression has the form Y=0+1X+. When you perform a regression analysis, several assumptions about the distribution of the error terms must be met to provide valid tests of hypothesis and confidence intervals. The assumptions are that the error terms 𝑒 ~ 𝑖.𝑖.𝑑 𝑁(0, 𝜎 2 ) have a mean of 0 at each value of the predictor variable are normally distributed at each value of the predictor variable have the same variance at each value of the predictor variable are independent, thus making them IID.

7 Scatter Plot of Correct Model
Y = X R2 = 0.67 To illustrate the importance of plotting data, consider the following four examples. In each example, the scatter plot of the data values is different. However, the regression equation and the R-square statistic are the same. In the first plot, a regression line adequately describes the data.

8 Scatter Plot of Curvilinear Model
Y = X R2 = 0.67 In the second plot, a simple linear regression model is not appropriate because you are fitting a straight line through a curvilinear relationship.

9 Scatter Plot of Outlier Model
Y = X R2 = 0.67 In the third plot, there seems to be an outlying data value that is affecting the regression line.

10 Scatter Plot of Influential Model
Y = X R2 = 0.67 In the fourth plot, the outlying data point dramatically changes the fit of the regression line. In fact, the slope would be undefined without the outlier.

11 Homogeneous Variance

12 Heterogeneous Variance

13 Model Assumptions (Cont.)
Recall N(0, 2) 2 is unknown and must be estimated Recall one-sample case In regression, we have the Mean Squared Error to estimate 2.

14 Degrees of Freedom (df)
In general, the df associated with the estimation of 2 in regression is n - (k + 1) where n = sample size k = number of independent variables “1” represents the intercept

15 Degrees of Freedom - Example
Model y = 0 + 1x1+ 2x2 + 3x3 +  Degrees of Freedom associated with this model are

16 What Does MSE Mean? (see – Central Company Output)
Just like the sample variance, a more intuitive meaning would come from the standard deviation Approximately 95% of all predicted values should be between 2s

17 Inferences about 1 Goal is to model the relationship between x and y via y = 0 + 1x +  What does it mean if there is no relationship?

18 Inferences about 1 (Cont.)
Graphically...

19 Inferences about 1 (Cont.)
What hypothesis are we interested in? We want to test whether 1 is significantly different from 0 That is, H0: 1 = 0 H1: 1  0

20 Inferences about 1 (Cont.)
Need sampling dist. of the est. for 1 FACT: For the model y = 0 + 1x + , with N(0, 2), the LS estimator of 1 is normal with a mean of 1 and a variance of 2/SSxx.

21 Inferences about 1 (Cont.)
Test Statistic Has (n-2) degrees of freedom for SLR 1 normally will be 0 because we just want to determine if there is a relationship between x and y

22 Hypothesis Test for 1 Null Hypothesis: H0: 1 = 0
Alternative Hypothesis H1: 1 < 0 H1: 1 > 0 H1: 1  0 Test Statistic Rejection Region - Rej. H0 if tobs < -t,df - Rej. H0 if tobs > t,df - Rej. H0 if tobs < -t/2,df or if tobs > t/2,df Decision and Conclusion in terms of problem

23 Confidence Interval for 1
A 100(1-)% confidence interval (CI) for 1 is given by Interpretation: We are 100(1-)% confident that the true mean change in response per unit change in x is within the LCL and the UCL for 1. What affects CI? confidence level sample size

24 Inferences about Slope - Example 1
The director of admissions of a small college administered a newly designed entrance test to 20 students selected at random from the new freshman class in a study to determine whether a student’s grade point average (GPA) at the end of the freshman year (y) can be predicted from the entrance test score (x).

25 Inferences about Slope - Example 1(Cont.)
Obtain the least squares estimates of 0 and 1, and state the estimated regression equation.

26 Inferences about Slope - Example 1(Cont.)
Obtain a 99% confidence interval for 1. Interpret your confidence interval.  = 0.01, /2 = 0.005, df = = 18 t0.005,18 = 2.878 99% Confidence interval is:  (0.4350)/3.0199 (0.4253, ) Interpretation: We are 99% confident that the true value of 1 will be contained in the above interval. Meaning: ?

27

28 F Test for Linear Regression Model
To test H0: 1= 2 = …= k = 0 versus Ha: At least one of the 1, 2, …, k is not equal to 0 Test Statistic: Reject H0 in favor of Ha if: F(model) > Fa or p-value < a Fa is based on k numerator and n-(k+1) denominator degrees of freedom.

29

30

31 The Partial F Test: Testing the Significance of a Portion of a Regression Model
To test H0: g+1= g+2 = …= k = 0 versus Ha: At least one of the g+1, g+2, …, k is not equal to 0 Partial F Statistic: Reject H0 in favor of Ha if: F > Fa or p-value < a Fa is based on k-g numerator and n-(k+1) denominator degrees of freedom.

32 Multiple Regression Salsberry Realty

33 Estimation & Prediction
The fitted SLR model is Estimating y at a given value of x, say xp, yields the same value as predicting y at a given value of xp. Difference is in precision of the estimate... the sampling errors

34 Estimation & Prediction (Cont.)
Sampling Error for the Estimate of the mean of y at xp Sampling Error for the Prediction of y at xp

35 Estimation & Prediction (Cont.)
A 100(1-)% Confidence Interval for y at x=xp is given by

36 Estimation & Prediction (Cont.)
A 100(1-)% Prediction Interval for y at x=xp is given by


Download ppt "Regression Analysis: Statistical Inference"

Similar presentations


Ads by Google