Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.

Similar presentations


Presentation on theme: "Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression."— Presentation transcript:

1 Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression

2 Linear Regression The underlying analysis is the same as that of correlation, but with a prediction of a causal relation. The underlying analysis is the same as that of correlation, but with a prediction of a causal relation. The results will be the same, but the framework is used when we are anticipating through theory or experimentation that one variable will influence another. The results will be the same, but the framework is used when we are anticipating through theory or experimentation that one variable will influence another. The regression framework can also be extended to analyze multiple causes and to separate their unique levels of influence on a dependent variable. The regression framework can also be extended to analyze multiple causes and to separate their unique levels of influence on a dependent variable.

3 Breast Cancer and Solar Radiation Let’s return to our example of breast cancer rate as a function of solar radiation. Here, the direction of causality can be inferred, though without conducting an experiment, it cannot be proven.

4 Breast Cancer and Solar Radiation As in correlation, the regression line is the line that minimizes residuals (i.e., errors of prediction).

5 Fitting a Regression Line A linear function (i.e. straight line) is defined with two parameters A linear function (i.e. straight line) is defined with two parameters The intercept: a The intercept: a The predicted value of Y when X=0 The predicted value of Y when X=0 The slope: b The slope: b The change in Y associated with a one unit change in X The change in Y associated with a one unit change in X Y ‘hat’ is the predicted value of Y estimated with the regression equation. Y ‘hat’ is the predicted value of Y estimated with the regression equation.

6 Breast Cancer and Solar Radiation Here, the residuals are defined as: To fit the line, we want to minimize errors, but given randomly distributed errors, the sum will equal zero. So, we will minimize squared errors.

7 Calculating Regression Coefficients The formulas to calculate the intercept and slope are derived from criteria meant to minimize the squared residuals The formulas to calculate the intercept and slope are derived from criteria meant to minimize the squared residuals Often termed OLS regression Often termed OLS regression Ordinary Least Squares Ordinary Least Squares

8 What’s the predicted cancer rate for an area with solar radiation of 425?

9 Standardized Regression When we are working with standardized variables, both the calculations and the association with correlation become clearer. When we are working with standardized variables, both the calculations and the association with correlation become clearer. In this case, both variables are z- transformed to be distributions with means of zero and sd’s of 1. In this case, both variables are z- transformed to be distributions with means of zero and sd’s of 1.

10 Standardized Regression Here, the intercept and slope are referred to as alpha (  ) and beta (  ), respectively. Here, the intercept and slope are referred to as alpha (  ) and beta (  ), respectively. Note that  =0 and  must range from -1 to 1 as in correlation. Note that  =0 and  must range from -1 to 1 as in correlation. In fact,  =r In fact,  =r Note that b is in sd units. Note that b is in sd units. What does a b=.25 mean? What does a b=.25 mean? For every 1 sd change in X, the predicted Y score increases.25 sd’s. For every 1 sd change in X, the predicted Y score increases.25 sd’s.

11 Accuracy of Prediction Simply fitting a regression line with a given intercept and slope provides little information with respect to the accuracy of prediction. Simply fitting a regression line with a given intercept and slope provides little information with respect to the accuracy of prediction. The points could be close or far from the line. The points could be close or far from the line. Note when using standardized scores, distance from the line is a function of slope. Note when using standardized scores, distance from the line is a function of slope. We need a measure of fit that is sensitive to the magnitude of residuals. We need a measure of fit that is sensitive to the magnitude of residuals.

12 Standard Error of the Estimate In arriving at a measure of fit, we can begin with the idea of a standard deviation. In arriving at a measure of fit, we can begin with the idea of a standard deviation. If we did not know anything about a person’s score on X, the best guess for a score on Y would be the mean of Y. If we did not know anything about a person’s score on X, the best guess for a score on Y would be the mean of Y. Standard deviation of Y would provide a measure of accuracy of the guess. Standard deviation of Y would provide a measure of accuracy of the guess.

13 Standard Error of the Estimate If we want to make a prediction of Y based on a person’s X score (using the regression equation), we can now calculate deviations from the predicted value as opposed to the mean. If we want to make a prediction of Y based on a person’s X score (using the regression equation), we can now calculate deviations from the predicted value as opposed to the mean. This is the standard error of the estimate. This is the standard error of the estimate. It’s square is the error variance. It’s square is the error variance. That portion of total variance in Y not explained by scores on X. That portion of total variance in Y not explained by scores on X.

14 Squared Correlation Coefficient Following the preceding logic, r 2 can be interpreted as the amount of variance in Y explained by X (i.e., deviations from the mean of X) Following the preceding logic, r 2 can be interpreted as the amount of variance in Y explained by X (i.e., deviations from the mean of X) SS means the sum of squared deviations. SS means the sum of squared deviations.

15 Influence of Extreme Values Extreme values will bias regression coefficients in the same manner as correlation coefficients. Extreme values will bias regression coefficients in the same manner as correlation coefficients. They pull the line of best fit with inordinate strength. They pull the line of best fit with inordinate strength. Applet Applet

16 Hypothesis Testing in Regression The null hypothesis is simply that the slope equals zero. The null hypothesis is simply that the slope equals zero. This is equivalent to testing  =0 in correlation. This is equivalent to testing  =0 in correlation. If the correlation is significant, so must the slope be. If the correlation is significant, so must the slope be. The actual significance of the slope is tested using a t-distribution. The actual significance of the slope is tested using a t-distribution. The logic is similar to all hypothesis testing. The logic is similar to all hypothesis testing. We compare the magnitude of the slope (b) to its standard error (i.e., the variability of slopes drawn from a population where the null is true). We compare the magnitude of the slope (b) to its standard error (i.e., the variability of slopes drawn from a population where the null is true).

17 Hypothesis Testing in Regression The formula to calculate the t value is: The formula to calculate the t value is: We then determine how likely it would be that we found a slope as large as we did using a t distribution (similar to the normal distribution). We then determine how likely it would be that we found a slope as large as we did using a t distribution (similar to the normal distribution).


Download ppt "Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression."

Similar presentations


Ads by Google