Download presentation
Presentation is loading. Please wait.
Published bySimon Tate Modified over 8 years ago
1
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression
2
2 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction
3
3 Correlation
4
4 Definition Correlation exists between two variables when one of them is related to the other in some way
5
5 Assumptions 1. The sample of paired data ( x,y ) is a random sample. 2. The pairs of ( x,y ) data have a bivariate normal distribution.
6
6 Definition Scatterplot (or scatter diagram) is a graph in which the paired ( x,y ) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual ( x,y ) pair is plotted as a single point.
7
7 Scatter Diagram of Paired Data
8
8
9
9 Positive Linear Correlation x x y yy x Scatter Plots (a) Positive (b) Strong positive (c) Perfect positive
10
10 Negative Linear Correlation x x y yy x (d) Negative (e) Strong negative (f) Perfect negative Scatter Plots
11
11 No Linear Correlation x x y y (g) No Correlation (h) Nonlinear Correlation Scatter Plots
12
12 xy/n - ( x/n)( y/n) (SDx) (SDy) r = Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample Where xy/n is the mean of the cross products; ( x/n) is the mean of the x variable; ( y/n) is the mean of the y variable; SDx is the standard deviation of the x variable and SDy is the standard deviation of the x variable
13
13 Notation for the Linear Correlation Coefficient n number of pairs of data presented denotes the addition of the items indicated. x/n denotes the mean of all x values. y/n denotes the mean of all y values. xy/n denotes the mean of the cross products [x times y, summed; divided by n] r linear correlation coefficient for a sample linear correlation coefficient for a population
14
14 Round to three decimal places Use calculator or computer if possible Rounding the Linear Correlation Coefficient r
15
15 Properties of the Linear Correlation Coefficient r 1. -1 r 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship.
16
16 Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Sig. Table, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Remember to use n-2
17
17 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation.
18
18 0 50 100 150 200 250 0 12345678 Distance (feet) Time (seconds) Common Errors Involving Correlation
19
19 Correlation is Not Causation AB C
20
20 Correlation Calculations Rank Order Correlation - Rho Pearson’s - r
21
21 Rank Order Correlation HitsRankHRRankDD2D2 1103824 294724 385624 471 -39 567424 656500 7429-525 8310124 929200 18324
22
22 Rank Order Correlation, cont HitsRankHRRankDD2D2 1103824 294724 385624 471 -39 567424 656500 7429-525 8310124 929200 18324 Rho = 1- [6 ( ∑D 2 ) / N (N 2 -1)] Rho = 1- [6(58)/10(10 2 -1)] Rho = 1- [348 / 10 (100 -1)] Rho = 1- [348 / 990] Rho = 1- 0.352 Rho = 0.648 ( ∑D 2 = 58 ) N=10
23
23 Pearson’s r HitsHR xy 133 248 3515 414 5735 6636 7214 81080 9981 10880 x/n =5.5 xy/n = 32.86 xy/n - ( x/n)( y/n) (SDx) (SDy) r = r = 32.86 - (5.5) (5.5)/(3.03) (3.03) r = 35.86 - 30.25 / 9.09 r = 5.61 / 9.09 r = 0.6172
24
24 Pearson’s r Excel Demonstration
25
25 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation?
26
26 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation?
27
27 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Is there a significant linear correlation? r = 0.842 R 2 = 0.71
28
28 n = 8 = 0.05 H 0 : = 0 H 1 : 0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table R with n = 8 and = 0.05) TABLE R Critical Values of the Pearson Correlation Coefficient r 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.361.335.312.294.279.254.236.220.207.196 =.05 =.01 Is there a significant linear correlation?
29
29 0 r = - 0.707 r = 0.707 1 Sample data: r = 0.842 - 1 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H 0 : = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Is there a significant linear correlation? Fail to reject = 0 Reject = 0 Reject = 0
30
30 Method 1: Test Statistic is t (follows format of earlier chapters)
31
31 Formal Hypothesis Test To determine whether there is a significant linear correlation between two variables Two methods Both methods letH 0 : = (no significant linear correlation) H 1 : (significant linear correlation)
32
32 Test statistic: r Critical values: Refer to Table R (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations)
33
33 Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Method 2: Test Statistic is r (uses fewer calculations) Fail to reject = 0 0 r = - 0.811 r = 0.811 1 Sample data: r = 0.828 Reject = 0 Reject = 0
34
34 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: 1 - r 2 n - 2 r Critical values: use Table T with degrees of freedom = n - 2 t =
35
35 Testing for a Linear Correlation If H 0 is rejected conclude that there is a significant linear correlation. If you fail to reject H 0, then there is not sufficient evidence to conclude that there is linear correlation. If the absolute value of the test statistic exceeds the critical values, reject H 0 : = 0 Otherwise fail to reject H 0 The test statistic is t = 1 - r 2 n -2 r Critical values of t are from Table A-3 with n -2 degrees of freedom The test statistic is Critical values of t are from Table A-6 r Calculate r using Formula 9-1 Select a significance level Let H 0 : = 0 H 1 : 0 Start METHOD 1 METHOD 2
36
36 Why does the critical value of r increase as sample size decreases? A correlation by chance is more likely.
37
37 Coefficient of Determination (Effect Size) r2r2 The part of variance of one variable that can be explained by the variance of a related variable.
38
38 x = 3 Quadrant 3 Quadrant 2 Quadrant 1 Quadrant 4 x y y = 11 (x, y) x - x = 7- 3 = 4 y - y = 23 - 11 = 12 01234567 0 4 8 12 16 20 24 r =r = (x -x) (y -y) (n -1 ) S x S y (x, y) centroid of sample points (7, 23) Justification for r Formula
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.