Download presentation
Presentation is loading. Please wait.
Published byBritney Welch Modified over 9 years ago
1
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r
2
Translating a research question into a statistical procedure How strong is the linear relationship between skin cancer mortality and latitude? –(Pearson) correlation coefficient r –Coefficient of determination r 2
3
Where does this topic fit in? Model formulation Model estimation Model evaluation Model use
4
Situation #1 A very weak linear relationship
5
Situation #2 A fairly strong linear relationship
6
Coefficient of determination r 2 r 2 is a number (a proportion!) between 0 and 1. If r 2 = 1: –all data points fall perfectly on the regression line –the predictor x accounts for all of the variation in y If r 2 = 0: –the fitted regression line is perfectly horizontal –the predictor x accounts for none of the variation in y
7
Interpretation of r 2 r 2 ×100 percent of the variation in y is reduced by taking into account predictor x. r 2 ×100 percent of the variation in y is “explained by” the variation in predictor x.
8
R-sq in Minitab fitted line plot
9
R-sq in Minitab regression output The regression equation is Mort = 389.189 - 5.97764 Lat S = 19.1150 R-Sq = 68.0 % R-Sq(adj) = 67.3 % Analysis of Variance Source DF SS MS F P Regression 1 36464.2 36464.2 99.7968 0.000 Error 47 17173.1 365.4 Total 48 53637.3
10
Pearson correlation coefficient r r is a (unitless) number between -1 and 1, inclusive. Sign of coefficient of correlation –plus sign if slope of fitted regression line is positive –negative sign if slope of fitted regression line is negative If r 2 is represented in decimal form, e.g. 0.39 or 0.87, then:
11
Formulas for the Pearson correlation coefficient r
12
What do we learn from the formulas for r? The correlation coefficient r gets its sign from the slope b 1. The correlation coefficient r is a unitless measure. The correlation coefficient r = 0 when the estimated slope b 1 = 0 and vice versa.
13
Interpretation of Pearson correlation coefficient r There is no nice practical interpretation for r as there is for r 2. r = -1 is perfect negative linear relationship. r = 1 is perfect positive linear relationship. r = 0 is no linear relationship. For other r, how strong the relationship between x and y is deemed depends on the research area.
14
Pearson correlation coefficient r in Minitab Correlations: Mort, Lat Pearson correlation of Mort and Lat = -0.825 Correlations: Lat, Mort Pearson correlation of Lat and Mort = -0.825
15
How strong is the linear relationship between Celsius and Fahrenheit? Pearson correlation of Celsius and Fahrenheit = 1.000
16
How strong is the linear relationship between # of stories and height? Pearson correlation of HEIGHT and STORIES = 0.951
17
How strong is the linear relationship between driver age and see distance? Pearson correlation of Distance and DrivAge = -0.801
18
How strong is the linear relationship between height and g.p.a.? Pearson correlation of height and gpa = -0.053
19
Caution #1 The correlation coefficient r quantifies the strength of a linear relationship. It is possible to get r = 0 with a perfect curvilinear relationship.
20
Example of Caution #1 Pearson correlation of x and y = 0.000
21
Clarification of Caution #1 Pearson correlation of x and y = 0.000
22
Caution #2 A large r 2 value should not be interpreted as meaning that the estimated regression line fits the data well. Another function might better describe the trend in the data.
23
Example of Caution #2 Pearson correlation of Year and USPopn = 0.959
24
Caution #3 The coefficient of determination r 2 and the correlation coefficient r can both be greatly affected by just one data point (or a few data points).
25
Example of Caution #3 Pearson correlation of Deaths and Magnitude = 0.732
26
Example of Caution #3 Pearson correlation of Deaths and Magnitude = -0.960
27
Caution #4 Correlation (association) does not imply causation.
28
Example of Caution #4 Pearson correlation of Wine and Heart = -0.843
29
Caution #5 Ecological correlations are correlations that are based on rates or averages. Ecological correlations tend to overstate the strength of an association.
30
Example of Caution #5 Data from 1988 Current Population Survey Treating individuals as the units –Correlation between income and education for men age 25-64 in U.S. is r ≈ 0.4. Treating nine regions as the units –Compute average income and average education for men age 25-64 in each of the nine regions. –Correlation between the average incomes and the average education in U.S. is r ≈ 0.7.
31
Example of Caution #5
33
Caution #6 A “statistically significant” r 2 does not imply that the slope β 1 is meaningfully different from 0.
34
Caution #7 A large r 2 does not necessarily mean that a useful prediction of the response y new (or estimation of the mean response μ Y ) can be made. It is still possible to get prediction (or confidence) intervals that are too wide to be useful.
35
Using the sample correlation r to learn about the population correlation ρ
36
Translating a research question into a statistical procedure Is there a linear relationship between skin cancer mortality and latitude? –t-test for testing H 0 : β 1 = 0 –ANOVA F-test for testing H 0 : β 1 = 0 Is there a linear correlation between husband’s age and wife’s age? –t-test for testing population correlation coefficient H 0 : ρ = 0
37
Where does this topic fit in? Model formulation Model estimation Model evaluation Model use
38
Is there a linear correlation between husband’s age and wife’s age? Pearson correlation of HAge and WAge = 0.939
39
Is there a linear correlation between husband’s age and wife’s age? Pearson correlation of WAge and HAge = 0.939
40
The formal t-test for correlation coefficient ρ Null hypothesis H 0 : ρ = 0 Alternative hypothesis H A : ρ ≠ 0 or ρ 0 Test statistic P-value = What is the probability that we’d get a t* statistic as extreme as we did, if the null hypothesis is true? The P-value is determined by comparing t* to a t distribution with n-2 degrees of freedom.
41
Is there a linear correlation between husband’s age and wife’s age? Test statistic: Student's t distribution with 168 DF x P( X <= x ) 35.3900 1.0000 Help in determining the P-value: Just let Minitab do the work: Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000
42
When is it okay to use the t-test for testing H 0 : ρ = 0? When it is not obvious which variable is the response. When the (x, y) pairs are a random sample from a bivariate normal population. –For each x, the y’s are normal with equal variances. –For each y, the x’s are normal with equal variances. –Either, y can be considered a linear function of x. –Or, x can be considered a linear function of y. The (x, y) pairs are independent.
43
The three tests will always yield similar results. Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000 The regression equation is HAge = 3.59 + 0.967 Wage 170 cases used 48 cases contain missing values Predictor Coef SE Coef T P Constant 3.590 1.159 3.10 0.002 WAge 0.96670 0.02742 35.25 0.000 S = 4.069 R-Sq = 88.1% R-Sq(adj) = 88.0% Analysis of Variance Source DF SS MS F P Regression 1 20577 20577 1242.51 0.000 Error 168 2782 17 Total 169 23359
44
The three tests will always yield similar results. The regression equation is WAge = 1.57 + 0.911 HAge 170 cases used 48 cases contain missing values Predictor Coef SE Coef T P Constant 1.574 1.150 1.37 0.173 HAge 0.91124 0.02585 35.25 0.000 S = 3.951 R-Sq = 88.1% R-Sq(adj) = 88.0% Analysis of Variance Source DF SS MS F P Regression 1 19396 19396 1242.51 0.000 Error 168 2623 16 Total 169 22019 Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000
45
Which results should I report? If one of the variables can be clearly identified as the response, report the t-test or F-test results for testing H 0 : β 1 = 0. –Does it make sense to use x to predict y? If it is not obvious which variable is the response, report the t-test results for testing H 0 : ρ = 0. –Does it only make sense to look for an association between x and y?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.