Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Regression

Similar presentations


Presentation on theme: "Introduction to Regression"— Presentation transcript:

1 Introduction to Regression

2 Figure (p. 564) Predicting the variance in academic performance from IQ and SAT scores. The overlap between IQ and academic performance indicates that 40% of the variance in academic performance can be predicted from IQ scores. Similarly, 30% of the variance in academic performance can be predicted from SAT scores. However, IQ and SAT also overlap, so that SAT scores contribute an additional prediction of only 10% beyond what is already predicted by IQ. Predicting the variance in academic performance from IQ and SAT scores. The overlap between IQ and academic performance indicates that 40% of the variance in academic performance can be predicted from IQ scores. Similarly, 30% of the variance in academic performance can be predicted from SAT scores. However, IQ and SAT also overlap, so that SAT scores contribute an additional prediction of only 10% beyond what is already predicted by IQ.

3

4 1. We are investigating only linear relationships.
2. For each x value, y is a random variable having a normal (bell-shaped) distribution. All of these y distributions have the same variance. Also, for a given value of x, the distribution of y-values has a mean that lies on the regression line. (Results are not seriously affected if departures from normal distributions and equal variances are not too extreme.)

5 The regression equation is obtained by first finding the error (or distance) between the actual data points and the predicted values on the line. Each error is then squared to make the values consistently positive. The goal of regression is to find the equation that produces the smallest total amount of squared error. Thus, the regression equation produces the “best fitting” line for the data points.

6 The regression equation is defined by the slope constant, b = SP/SSX, and the Y-intercept, a = MY  bMX, producing a linear equation of the form Y = bX + a. The equation can be used to compute a predicted Y value for each of the X values in the data.

7 The simple concept is that each new variable provides more information and allows for more accurate predictions. Having two predictors in the equation will produce more accurate predictions (less error and smaller residuals) than can be obtained using either predictor by itself.

8

9 Figure (p. 550) Hypothetical data showing the relationship between SAT scores and GPA with a regression line drawn through the data points. The regression line defines a precise, one-to-one relationship between each X value (SAT score) and its corresponding Y value (GPA).

10 Figure (p. 551) Relationship between total cost and number of hours playing tennis. The tennis club charges a $25 membership fee plus $5 per hour. The relationship is described by a linear equation: Total cost = $5 (number of hours) + $25 Y = bX + a

11 Figure (p. 553) The distance between the actual data point (Y) and the predicted point on the line (Ŷ) is defined as Y – Ŷ. The goal of regression is to find the equation for the line that minimizes these distances.

12 Figure 17. 4 (p. 555) The scatterplot for the data in Example 17
Figure (p. 555) The scatterplot for the data in Example 17.1 is shown with the best-fitting straight line. The predicted Y values (Ŷ) are on the regression line. Unless the correlation is perfect (+1.00 or – 1.00), there will be some error between the actual Y values and the predicted Y values. The larger the correlation is, the less the error will be.

13 Figure (p. 558) (a) Scatter plot showing data points that perfectly fit the regression equation Ŷ = 1.6X – 2. Note that the correlation is r = (b) Scatter plot for the data from Example Notice that there is error between the actual data points and the predicted Y values of the regression line.

14 Figure (p. 563) The partitioning of SS and df for analysis of regression. The variability in the original Y scores (both SSY and dfY) is partitioned into two components: (a) the variability that is explained by the regression equation, and (b) the residual variability.

15 Table 17.1 (p. 563) A summary table showing the results from an analysis of regression.

16 Figure (p. 564) Predicting the variance in academic performance from IQ and SAT scores. The overlap between IQ and academic performance indicates that 40% of the variance in academic performance can be predicted from IQ scores. Similarly, 30% of the variance in academic performance can be predicted from SAT scores. However, IQ and SAT also overlap, so that SAT scores contribute an additional prediction of only 10% beyond what is already predicted by IQ. Predicting the variance in academic performance from IQ and SAT scores. The overlap between IQ and academic performance indicates that 40% of the variance in academic performance can be predicted from IQ scores. Similarly, 30% of the variance in academic performance can be predicted from SAT scores. However, IQ and SAT also overlap, so that SAT scores contribute an additional prediction of only 10% beyond what is already predicted by IQ.

17 Table (p. 566) Hypothetical data consisting of three scores for each person. Two of the scores, X1 and X2, are used to predict the Y score for each individual.

18 Table (p. 567) The predicted Y values and the residuals for the data in Table The predicted Y values were obtained using the values of X1 and X2 in the multiple-regression equation for each individual.

19 A significant F-ratio indicates that the regression equation predicts a significant portion (more than just chance) of the variance in the Y scores. What are the weights of each of the IVs? (look at the betas)


Download ppt "Introduction to Regression"

Similar presentations


Ads by Google