Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.

Similar presentations


Presentation on theme: "1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression."— Presentation transcript:

1 1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression

2 2 Overview Paired Data  is there a relationship  if so, what is the equation  use the equation for prediction

3 3 Correlation

4 4 Definition  Correlation exists between two variables when one of them is related to the other in some way

5 5 Assumptions 1. The sample of paired data ( x,y ) is a random sample. 2. The pairs of ( x,y ) data have a bivariate normal distribution.

6 6 Definition  Scatterplot (or scatter diagram) is a graph in which the paired ( x,y ) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual ( x,y ) pair is plotted as a single point.

7 7 Scatter Diagram of Paired Data

8 8

9 9 Positive Linear Correlation x x y yy x Scatter Plots (a) Positive (b) Strong positive (c) Perfect positive

10 10 Negative Linear Correlation x x y yy x (d) Negative (e) Strong negative (f) Perfect negative Scatter Plots

11 11 No Linear Correlation x x y y (g) No Correlation (h) Nonlinear Correlation Scatter Plots

12 12  xy/n - (  x/n)(  y/n) (SDx) (SDy) r = Definition  Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample Where  xy/n is the mean of the cross products; (  x/n) is the mean of the x variable; (  y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable

13 13 Notation for the Linear Correlation Coefficient n number of pairs of data presented  denotes the addition of the items indicated.  x/n denotes the mean of all x values.  y/n denotes the mean of all y values.  xy/n denotes the mean of the cross products [x times y, summed; divided by n] r linear correlation coefficient for a sample  linear correlation coefficient for a population

14 14  Round to three decimal places  Use calculator or computer if possible Rounding the Linear Correlation Coefficient r

15 15 Properties of the Linear Correlation Coefficient r 1. -1  r  1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship.

16 16 Interpreting the Linear Correlation Coefficient  If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation.  Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.  Remember to use n-2

17 17 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation.

18 18 0 50 100 150 200 250 0 12345678 Distance (feet) Time (seconds) Common Errors Involving Correlation

19 19 Correlation is Not Causation AB C

20 20 Correlation Calculations Rank Order Correlation - Rho Pearson’s - r

21 21 Rank Order Correlation HitsRankHRRankDD2D2 1103824 294724 385624 471 -39 567424 656500 7429-525 8310124 929200 18324

22 22 Rank Order Correlation, cont HitsRankHRRankDD2D2 1103824 294724 385624 471 -39 567424 656500 7429-525 8310124 929200 18324 Rho = 1- [6 ( ∑D 2 ) / N (N 2 -1)] Rho = 1- [6(58)/10(10 2 -1)] Rho = 1- [348 / 10 (100 -1)] Rho = 1- [348 / 990] Rho = 1- 0.352 Rho = 0.648 ( ∑D 2 = 58 ) N=10

23 23 Pearson’s r HitsHR  xy 133 248 3515 414 5735 6636 7214 81080 9981 10880  x/n =5.5  xy/n = 32.86  xy/n - (  x/n)(  y/n) (SDx) (SDy) r = r = 32.86 - (5.5) (5.5)/(3.03) (3.03) r = 35.86 - 30.25 / 9.09 r = 5.61 / 9.09 r = 0.6172

24 24 Pearson’s r Excel Demonstration

25 25 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation?

26 26 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation?

27 27 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation? r = 0.842 R 2 = 0.71

28 28 n = 8  = 0.05 H 0 :  = 0 H 1 :   0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table R with n = 8 and  = 0.05) TABLE R Critical Values of the Pearson Correlation Coefficient r 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.361.335.312.294.279.254.236.220.207.196   =.05   =.01 Is there a significant linear correlation?

29 29 0 r = - 0.707 r = 0.707 1 Sample data: r = 0.842 - 1 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H 0 :  = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Is there a significant linear correlation? Fail to reject  = 0 Reject  = 0 Reject  = 0

30 30 Method 1: Test Statistic is t (follows format of earlier chapters)

31 31 Formal Hypothesis Test  To determine whether there is a significant linear correlation between two variables  Two methods  Both methods letH 0 :  =  (no significant linear correlation) H 1 :    (significant linear correlation)

32 32  Test statistic: r  Critical values: Refer to Table R (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations)

33 33  Test statistic: r  Critical values: Refer to Table A-6 (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations) Fail to reject  = 0 0 r = - 0.811 r = 0.811 1 Sample data: r = 0.828 Reject  = 0 Reject  = 0

34 34 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: 1 - r 2 n - 2 r Critical values: use Table T with degrees of freedom = n - 2 t =

35 35 Testing for a Linear Correlation If H 0 is rejected conclude that there is a significant linear correlation. If you fail to reject H 0, then there is not sufficient evidence to conclude that there is linear correlation. If the absolute value of the test statistic exceeds the critical values, reject H 0 :  = 0 Otherwise fail to reject H 0 The test statistic is t = 1 - r 2 n -2 r Critical values of t are from Table A-3 with n -2 degrees of freedom The test statistic is Critical values of t are from Table A-6 r Calculate r using Formula 9-1 Select a significance level  Let H 0 :  = 0 H 1 :   0 Start METHOD 1 METHOD 2

36 36 Why does the critical value of r increase as sample size decreases? A correlation by chance is more likely.

37 37 Coefficient of Determination (Effect Size) r2r2 The part of variance of one variable that can be explained by the variance of a related variable.

38 38 x = 3 Quadrant 3 Quadrant 2 Quadrant 1 Quadrant 4 x y y = 11 (x, y) x - x = 7- 3 = 4 y - y = 23 - 11 = 12 01234567 0 4 8 12 16 20 24 r =r =  (x -x) (y -y) (n -1 ) S x S y (x, y) centroid of sample points (7, 23) Justification for r Formula


Download ppt "1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression."

Similar presentations


Ads by Google