Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 024681012141618 20 0 10 20 30 40 50 60 Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents.

Similar presentations


Presentation on theme: "1 024681012141618 20 0 10 20 30 40 50 60 Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents."— Presentation transcript:

1

2 1 024681012141618 20 0 10 20 30 40 50 60 Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents

3 2 Correlation What type of relationship exists between the two variables and is the correlation significant? xy Cigarettes smoked per day Score on SAT Height Hours of Training Explanatory (Independent)Variable Response (Dependent)Variable A relationship between two variables. Number of Accidents Shoe SizeHeight Lung Capacity Grade Point Average IQ

4 3 Accidents Negative Correlation as x increases, y decreases x = hours of training y = number of accidents Scatter Plots and Types of Correlation

5 4 Positive Correlation as x increases y increases x = SAT score y = GPA GPA Scatter Plots and Types of Correlation

6 5 IQ No linear correlation x = height y = IQ Scatter Plots and Types of Correlation

7 6 xx x y 8 78 2 92 5 90 12 58 15 43 9 74 6 81 AbsencesGrade Application 0246810121416 40 45 50 55 60 65 70 75 80 85 90 95 x Final Grade Absences

8 7 Correlation Coefficient A measure of the strength and direction of a linear relationship between two variables The range of r is from -1 to 1. If r is close to 1 there is a strong positive correlation If r is close to -1 there is a strong negative correlation If r is close to 0 there is no linear correlation 0 1

9 8 6084 8464 8100 3364 1849 5476 6561 624 184 450 696 645 666 486 Computation of r 57516375157939898 x y 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 = - 0.975 64 4 25 144 225 81 36 xy x2x2 y2y2

10 9 r is the correlation coefficient for the sample. The correlation coefficient for the population is (rho). Hypothesis Test for the Significance of r H 0 : r = 0 No significant correlation The sampling distribution for r is a t-distribution with n-2 d.f. Standardized test statistic For a two tail test for significance: For left-tail and right tail to test negative or positive significance:

11 10 Test for Significance of r 2. State the level of significance 1. Write the null and alternative hypothesis 3. Identify the sampling distribution A t-distribution with 5 degrees of freedom. You found the correlation between the number of times absent and a final grade r = - 0.975. There were seven pairs of data.Test the significance of this correlation. Use.

12 11 t 0 4. Find the critical value Critical Values  t 0 4.032 -4.032 6. Find the test statistic 5. Find the rejection region 5. Find the rejection region Rejection Regions

13 12 7. Make your decision 8. Interpret your decision t 0 - 4.032 4.032 t = -9.811 falls in the rejection region. Reject the null hypothesis. There is a significant correlation between the number of times absent and final grades.

14 13 180 190 200 210 220 230 240 250 260 1.52.02.53.0 Ad $ didi Called a residual (xi,yi)(xi,yi) (xi,yi)(xi,yi) = a data point is a minimum revenue = a point on the line with same x-value

15 14 The equation of a line may be written as y = mx + b where m is the slope of the line and b is the y-intercept The line of regression is: The slope m is The y-intercept is The Line of Regression Once you know there is a significant linear correlation, you can write an equation describing the relationship between the x and y variables. This equation is called the line of regression or least squares line.

16 15 Calculate m and b Write the equation of the line of regression with x = number of times absent and y = final grade. The line of regression is: 624 184 450 696 645 666 486 575163751579 x y 1 8 78 2 2 92 3 5 90 4 12 58 5 15 43 6 9 74 7 6 81 64 4 25 144 225 81 36 xy x2x2

17 16 0246810121416 40 45 50 55 60 65 70 75 80 85 90 95 x Absences Final Grade Line of Regression m = -3.924 and b = 105.667 The line of regression is: Note that the point = (8.143, 73.714) is on the line.

18 17 Predicting y Values The regression line can be used to predict values of y for values of x falling within the range of the data. The regression equation for number of times absent and final grade is: Use this equation to predict the expected grade for a student with (a) 3 absences(b) 12 absences (a) (b)

19 18 The Coefficient of Determination The coefficient of determination, r 2 is the ratio of explained variation in y to the total variation in y. The correlation coefficient of number of times absent and final grade is r = - 0.975. The coefficient of determination is r 2 = (- 0.975) 2 = 0.9506. Interpretation: About 95% of the variation in final grades can be explained by the number of times a student is absent. The other 5% is unexplained and can be due to sampling error or other variables such as intelligence, amount of time studied etc.

20 19 The Standard Error of Estimate The Standard Error of Estimate s e is the standard deviation of the observed y i values about the predicted value.

21 20 1 8 78 74.275 13.8756 2 2 92 97.819 33.8608 3 5 90 86.047 15.6262 4 12 58 58.579 0.3352 5 15 43 46.807 14.4932 6 9 74 70.351 13.3152 7 6 81 82.123 1.2611 The Standard Error of Estimate 92.767 = 4.307 xy Calculate for each x.

22 21 Prediction Intervals Given a specific linear regression equation and x 0 a specific value of x, a c-prediction interval for y is: where Use a t-distribution with n-2 degrees of freedom. The point estimate is and E is the maximum error of estimate.

23 22 Application Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 1. Find the point estimate: The point (6, 82.123) is the point on the regression line with x-coordinate of 6.

24 23 Application Construct a 90% confidence interval for a final grade when a student has been absent 6 times. 2. Find E At the 90% level of confidence, the maximum error of estimate is 9.438

25 24 Application Construct a 90% confidence interval for a final grade when a student has been absent 6 times. When x = 6, the 90% confidence interval is from 72.685 to 91.586 3. Find the endpoints

26 25 Minitab Output Regression Analysis The regression equation is y = 106 - 3.92 x Predictor Coef StDev T P Constant 105.668 3.655 28.91 0.000 x -3.9241 0.4019 -9.76 0.000 S = 4.307 R-Sq = 95.0% R-Sq(adj) = 94.0%


Download ppt "1 024681012141618 20 0 10 20 30 40 50 60 Correlation and Regression Elementary Statistics Larson Farber Chapter 9 Hours of Training Accidents."

Similar presentations


Ads by Google